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Preface to the Third Edition 


A First Course in Abstract Algebra introduces groups and commutative rings. 
Group theory was invented by E. Galois in the early 1800s, when he used groups 
to completely determine when the roots of polynomials can be found by formulas 
generalizing the quadratic formula. Nowadays, groups are the precise way to dis- 
cuss various types of symmetry, both in geometry and elsewhere. Besides intro- 
ducing Galois’ ideas, we also apply groups to some intricate counting problems 
as well as to the classification of friezes in the plane. Commutative rings provide 
the proper context in which to study number theory as well as many aspects of 
the theory of polynomials. For example, generalizations of ideas such as greatest 
common divisor and modular arithmetic extend effortlessly to polynomial rings 
over fields. Applications include public access codes, finite fields, magic squares, 
Latin squares, and calendars. We then consider vector spaces with scalars in ar- 
bitrary fields (not just the reals), and this study allows us to solve the classical 
Greek problems concerning angle trisection, doubling the cube, squaring the 
circle, and construction of regular n-gons. Linear algebra over finite fields is 
applied to codes, showing how one can accurately decode messages sent over a 
noisy channel (for example, photographs sent to Earth from Mars or from Sat- 
urn). Here, one sees finite fields being used in an essential way. In Chapter 5, 
we give the classical formulas for the roots of cubic and quartic polynomials, 
after which both groups and commutative rings together are used to prove Ga- 
lois’ theorem (polynomials whose roots are obtainable by such formulas have 
solvable Galois groups) and Abel’s theorem (there is no generalization of these 
formulas to polynomials of higher degree). This is only an introduction to Galois 
theory; readers wishing to learn more of this beautiful subject will have to see 
a more advanced text. For those readers whose appetites have been whetted by 
these results, the last two chapters investigate groups and rings further: we prove 
the basis theorem for finite abelian groups and the Sylow theorems, and we in- 
troduce the study of polynomials in several variables: varieties; Hilbert’s basis 
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theorem, the Nullstellensatz, and algorithmic methods associated with Grobner 
bases. 

Let me mention some new features of this edition. I have rewritten the text, 
adding more exercises, and trying to make the exposition more smooth. The fol- 
lowing changes in format should make the book more convenient to use. Every 
exercise explicitly cited elsewhere in the text is marked by an asterisk; moreover, 
every citation gives the page number on which the cited exercise appears. Hints 
for certain exercises are in a section at the end of the book so that readers may 
consider problems on their own before reading hints. One numbering system 
enumerates all lemmas, theorems, propositions, corollaries, and examples, so 
that finding back references is easy. There are several pages of Special Notation, 
giving page numbers where notation is introduced. 

Today, abstract algebra is viewed as a challenging course; many bright stu- 
dents seem to have inordinate difficulty learning it. Certainly, they must learn 
to think in a new way. Axiomatic reasoning may be new to some; others may 
be more visually oriented. Some students have never written proofs; others may 
have once done so, but their skills have atrophied from lack of use. But none of 
these obstacles adequately explains the observed difficulties. After all, the same 
obstacles exist in beginning real analysis courses, but most students in these 
courses do learn the material, perhaps after some early struggling. However, the 
difficulty of standard algebra courses persists, whether groups are taught first, 
whether rings are taught first, or whether texts are changed. I believe that a ma- 
jor contributing factor to the difficulty in learning abstract algebra is that both 
groups and rings are introduced in the first course; as soon as a student begins to 
be comfortable with one topic, it is dropped to study the other. Furthermore, if 
one leaves group theory or commutative ring theory before significant applica- 
tions can be given, then students are left with the false impression that the theory 
is either of no real value or, more likely, that it cannot be appreciated until some 
future indefinite time. (Imagine a beginning analysis course in which both real 
and complex analysis are introduced in one semester.) If algebra is taught as 
a one-year (two-semester) course, there is no longer any reason to crowd both 
topics into the first course, and a truer, more attractive, picture of algebra is 
presented. This option is more practical today than in the past, for the many ap- 
plications of abstract algebra have increased the numbers of interested students, 
many of whom are working in other disciplines. 

I have rewritten this text for two audiences. This new edition can serve as a 
text for those who wish to continue teaching the currently popular arrangement 
of introducing both groups and rings in the first semester. As usual, one begins 
by covering most of Chapter 1 , after which one chooses selected parts of Chap- 
ters 2 and 3, depending on whether groups or commutative rings are taught first. 
Chapters 2 and 3 have been rewritten, and they are now essentially independent 
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of one another, so that this book may be used for either order of presentation. 
(As an aside, I disagree with the current received wisdom that doing groups first 
is more efficient than doing rings first; for example, the present version of Chap- 
ter 3 is about the same length as its earlier versions.) There is ample material in 
the book so that it can further serve as a text for a sequel course as well. 

Let me now address a second audience: those willing to try a new approach. 
My own ideas about teaching abstract algebra have changed, and I now think that 
a two-semester course in which only one of groups or rings is taught in the first 
semester, is best. I recommend a one-year course whose first semester covers 
number theory and commutative rings, and whose second semester covers lin- 
ear algebra and group theory. In more detail, the first semester should treat the 
usual selection of arithmetic theorems in Chapter 1: division algorithm; ged’s; 
euclidean algorithm; unique factorization; congruence; Chinese remainder the- 
orem. Continue with Section 2.1: functions; inverse functions; equivalence re- 
lations, and then commutative rings in Chapter 3: fraction fields of domains; 
generalizations of arithmetic theorems to polynomials; ideals; integers mod nr, 
isomorphism theorems; splitting fields, existence of finite fields, magic squares, 
orthogonal Latin squares. One could instead continue on in Chapter 2, covering 
group theory instead of commutative rings, but I think that doing commutative 
rings first is more user-friendly. It is natural to pass from Z to fcpt], and one can 
watch how the notion of ideal develops from a technique showing that ged’s are 
linear combinations into an important idea. 

For the second semester, I recommend beginning with portions of Chapter 4: 
linear algebra over arbitrary fields: invariance of dimension; ruler-compass con- 
structions; matrices and linear transformations; determinants over commutative 
rings. Most of this material can be done quickly if the students have completed 
an earlier linear algebra course treating vector spaces over R. If time permits, 
one can read the section on codes, which culminates with a proof that Reed- 
Solomon codes can be decoded. The remainder of the semester should discuss 
groups, as in Chapter 2: permutations; symmetries of planar figures; Lagrange’s 
theorem; isomorphism theorems; group actions; Burnside counting; and frieze 
groups, as in Chapter 6. If there is not ample time to cover codes and frieze 
groups, these sections are appropriate special projects for interested students. I 
prefer this organization and presentation, and I believe that it is an improvement 
over that of standard courses. 

Giving the etymology of mathematical terms is rarely done. Let me explain, 
with an analogy, why I have included derivations of many terms. There are many 
variations of standard poker games and, in my poker group, the dealer announces 
the game of his choice by naming it. Now some names are better than others. 
For example, “Little Red” is a game in which one’s smallest red card is wild; this 
is a good name because it reminds the players of its distinctive feature. On the 
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other hand, “Aggravation” is not such a good name, for though it is, indeed, sug- 
gestive, the name does not distinguish this particular game from several others. 
Most terms in mathematics have been well chosen; there are more red names than 
aggravating ones. An example of a good name is even permutation, for a per- 
mutation is even if it is a product of an even number of transpositions. Another 
example of a good term is the parallelogram law describing vector addition. But 
many good names, clear when they were chosen, are now obscure because their 
roots are either in another language or in another discipline. The trigonomet- 
ric terms tangent and secant are good names for those knowing some Latin, but 
they are obscure otherwise (see a discussion of their etymology on page 31). The 
term mathematics is obscure only because most of us do not know that it comes 
from the classical Greek word meaning “to learn.” The term corollary is doubly 
obscure; it comes from the Latin word meaning “flower,” but why should some 
theorems be called flowers? A plausible explanation is that it was common, in 
ancient Rome, to give flowers as gifts, and so a corollary is a gift bequeathed by 
a theorem. The term theorem comes from the Greek word meaning “to watch” 
or “to contemplate” ( theatre has the same root); it was used by Euclid with its 
present meaning. The term lemma comes from the Greek word meaning “taken” 
or “received;” it is a statement that is taken for granted (for it has already been 
proved) in the course of proving a theorem. I believe that etymology of terms 
is worthwhile (and interesting!), for it often aids understanding by removing un- 
necessary obscurity. 

In addition to thanking again those who helped me with the first two editions, 
it is a pleasure to thank George Bergman and Chris Heil for their valuable com- 
ments on the second edition. I also thank Iwan Duursma, Robert Friedman, Blair 
F. Goodlin, Dieter Roller, Fatma Irem Koprulu, J. Peter May, Feon McCulloh, 
Arnold Miller, Brent B. Solie, and John Wetzel. 


Joseph J. Rotman 
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Number Theory 


1.1 Induction 

There are many styles of proof, and mathematical induction is one of them. We 
begin by saying what mathematical induction is not. In the natural sciences, 
inductive reasoning is the assertion that a freqently observed phenomenon will 
always occur. Thus, one says that the Sun will rise tomorrow morning because, 
from the dawn of time, the Sun has risen every morning. This is not a legitimate 
kind of proof in mathematics, for even though a phenomenon has been observed 
many times, it need not occur forever. However, inductive reasoning is still valu- 
able in mathematics, as it is in natural science, because seeing patterns in data 
often helps in guessing what may be true in general. 

On the other hand, a reasonable guess may not be correct. For example, what 
is the maximum number of regions into which R 3 (3-dimensional space) can be 
divided by n planes? Two nonparallel planes can divide R 3 into 4 regions, and 
three planes can divide R 3 into 8 regions (octants). For smaller n, we note that 
a single plane divides R 3 into 2 regions, while if n = 0, then R 3 is not divided 
at all: there is 1 region. For n = 0, 1, 2, 3, the maximum number of regions is 
thus 1, 2, 4, 8, and it is natural to guess that n planes can be chosen to divide R 3 
into 2" regions. But it turns out that any four chosen planes can divide R 3 into at 
most 15 regions! 

Before proceeding further, let us make sure that we agree on the meaning of 
some standard terms. An integer is one of the numbers 0, 1,-1, 2, —2, 3,...; 
the set of all the integers is denoted by Z (from the German Zahl meaning num- 
ber): 

Z = {0, 1,-1, 2, -2,3,...}. 

The natural numbers consists of all those integers n for which n > 0: 

N = {« in Z : n > 0} = {0, 1, 2, 3, . . .}. 
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Definition. An integer d is a divisor of an integer n if n = da for some inte- 
ger a. An integer n is called prime 1 if n > 2 and its only divisors are ±1 and 
±»; an integer n is called composite if it is not prime. 

If a positive integer n is composite, then it has a factorization n = ah, where 
a < n and b < n are positive integers; the inequalities are present to eliminate 
the uninteresting factorization n = n x 1. The first few primes are 2, 3, 5, 7, 
11, 13, 17, 19, 23, 29, 31, 37, 41, . . that this sequence never ends is proved in 
Corollary 1.30. 

Consider the assertion that 

f(n) = n~ — n + 41 

is prime for every positive integer n. Evaluating /(/?) for n = 1, 2, 3, . . ., 40 
gives the numbers 

41,43,47,53,61,71,83,97, 113, 131. 

151, 173, 197,223,251,281,313,347,383,421, 

461, 503, 547, 593, 641, 691, 743, 797, 853, 911, 

971, 1033, 1097, 1163, 1231, 1301. 1373, 1447, 1523, 1601. 

It is tedious, but not very difficult, to show that every one of these numbers is 
prime (see Proposition 1.3). Inductive reasoning predicts that all the numbers of 
the form f(n ) are prime. But the next number, /( 41) = 1681, is not prime, for 
y (41) = 41 2 — 41 + 41 = 41 , which is obviously composite. Thus, inductive 
reasoning is not appropriate for mathematical proofs. 

Here is an even more spectacular example (which I first saw in an article by 
W. Sierpinski). Recall that perfect squares are numbers of the form /? , where n 
is an integer; the first few perfect squares are 0, 1, 4, 9, 16, 25, 36, .... For each 
n > 1 , consider the statement 

S(n) : 99 In 2 + 1 is not a perfect square. 

The nth statement, S(n), is true for many n m , in fact, the smallest number n for 
which S(n) is false is 

n = 12, 055, 735, 790, 331, 359, 447, 442, 538, 767 
S3 1.2 x 10 28 . 

The equation m 2 = 991 n 2 + 1 is an example of Pell’s equation — an equation 
of the form m 2 = pn 2 + 1, where p is prime — and there is a way of calcu- 
lating all possible solutions of it. An even larger example involves the prime 

1 One reason the number 1 is not called a prime is that many theorems involving primes 
would otherwise be more complicated to state. 
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p = 1,000,099; the smallest n for which 1 ,000,099/) 2 + 1 is a perfect square 
has 1116 digits. The most generous estimate of the age of the Earth is 10 billion 
(10,000,000,000) years, or 3.65 x 10 12 days, a number insignificant when com- 
pared to 1.2 x 10 28 , let alone 10 1115 . If, starting from the Earth’s very first day, 
one verified statement S(n) on the nth day, then there would be today as much 
evidence of the general truth of these statements as there is that the Sun will rise 
tomorrow morning. And yet some of the statements S(n) are false! 

As a final example, let us consider the following statement, known as Gold- 
bach’s conjecture: every even number m > 4 is a sum of two primes. No one has 
ever found a counterexample to Goldbach’s conjecture, but neither has anyone 
ever proved it. At present, the conjecture has been verified for all even numbers 
m < 10 13 , and it has been proved by J.-R. Chen that every sufficiently large even 
number m can be written as p + q, where p is prime and q is “almost” a prime; 
that is, q is either a prime or a product of two primes. Even with all of this pos- 
itive evidence, however, no mathematician will say that Goldbach’s conjecture 
must, therefore, be true for all even m. 

We have seen what (mathematical) induction is not; let us now discuss what 
induction is. Our discussion is based on the following property of the set of 
natural numbers (usually called the Well Ordering Principle). 

Least Integer Axiom. There is a smallest integer in every nonempty 2 subset 
C of the natural numbers N. 

Although this axiom cannot be proved (it arises in analyzing what integers 
are), it is certainly plausible. Consider the following procedure: check whether 
0 belongs to C; if it does, then 0 is the smallest integer in C. Otherwise, check 
whether 1 belongs to C ; if it does, then 1 is the smallest integer in C; if not, 
check 2. Continue this procedure until one bumps into C; this will occur eventu- 
ally because C is nonempty. 

Proposition 1.1 (Least Criminal). Let k be a natural number, and let S(k), 
S(k + 1), . . . , S(n), . . . be a list of statements. If some of these statements are 
false, then there is a first false statement. 

Proof. Let C be the set of all those natural numbers n > k for which S (n) is 
false; by hypothesis, C is a nonempty subset of IT. The Least Integer Axiom 
provides a smallest integer m in C, and S(m) is the first false statement. • 

This seemingly innocuous proposition is useful. 

Theorem 1.2. Every integer n > 2 is either a prime or a product of primes. 

2 Saying that C is nonempty merely means that there is at least one integer in C. 
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Proof. Were this not so, there would be “criminals:” there are integers n > 
2 which are neither primes nor products of primes; a least criminal m is the 
smallest such integer. Since m is not a prime, it is composite; there is thus a 
factorization m = ab with 2 < a < m and 2 < b < m (since a is an integer, 
1 < a implies 2 < a). Since m is the least criminal, both a and b are “honest,” 
i.e., 

a = pp'p" ■ ■ ■ and b = qq'q" • • • , 
where the factors p, p' , p", . . . and q. q' , q " . ... are primes. Therefore, 
m = ab = pp'p " • • • qq’q" • • • 

is a product of (at least two) primes, which is a contradiction. 3 • 

Proposition 1.3. If m > 2 is a positive integer which is not divisible by any 
prime p with p < s /m, then m is a prime. 

Proof. If m is not prime, then m = ab, where a < m and b < m are positive 
integers. If a > ^fm and b > J m ", then m = ab > ^/m^/m = m, a contradic- 
tion. Therefore, we may assume that a < sfm. By Theorem 1.2, a is either a 
prime or a product of primes, and any (prime) divisor p of a is also a divisor of 
m. Thus, if m is not prime, then it has a “small” prime divisor p\ i.e., p < s /m. 
The contrapositive says that if m has no small prime divisor, then m is prime. • 

Proposition 1.3 can be used to show that 991 is a prime. It suffices to check 
whether 991 is divisible by some prime p with p < V991 ~ 31.48; if 991 is 
not divisible by 2, 3, 5, . . . , or 31, then it is prime. There are 1 1 such primes, 
and one checks (by long division) that none of them is a divisor of 99 1 . (One 
can check that 1 ,000,099 is a prime in the same way, but it is a longer enterprise 
because its square root is a bit over 1000.) It is also tedious, but not difficult, to 
see that the numbers f(n) = n 2 — n +41, for 1 < n < 40, are all prime. 

Mathematical induction is a version of least criminal that is more convenient 
to use. The key idea is just this: Imagine a stairway to the sky. If its bottom step 
is white and if the next step above a white step is also white, then all the steps of 
the stairway must be white. (One can trace this idea back to Levi ben Gershon 
in 1321. There is an explicit description of induction, cited by Pascal, written 
by Francesco Maurolico in 1557.) For example, the statement “2" > n for all 

^The contrapositive of an implication “P implies (Q"‘ is the implication ‘(not Q) implies 
(not P)P For example, the contrapositive of ‘If a series a n converges, then limn^oo a n = 
0” is ‘If lim j.oo a„ + 0, then a n diverges.” If an implication is true, then so is its 
contrapositive; conversely, if the contrapositive is true, then so is the original implication. The 
strategy of this proof is to prove the contrapositive of the original implication. Although a 
statement and its contrapositive are logically equivalent, it is sometimes more convenient to 
prove the contrapositive. This method is also called indirect proof or proof by contradiction . 
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n > 1” can be regarded as an infinite sequence of statements (a stairway to the 
sky): 

2 1 > 1; 2 2 > 2; 2 3 > 3; 2 4 > 4; 2 5 > 5; •••. 

Certainly, 2 1 = 2 > 1. If 2 100 > 100, then 2 101 = 2 x 2 100 > 2 x 100 = 
100 + 100 > 101. There is nothing magic about the exponent 100; the same 
idea shows, having reached any stair, that we can climb up to the next one. This 
argument will be formalized in Proposition 1.5. 

Theorem 1.4 (Mathematical Induction 4 ). Given statements S(n), one for 
each natural number n, suppose that: 

(i) Base Step : 5(0) is true-, 

(ii) Inductive Step : if S(n) is true, then S(n + 1) is true. 

Then S(n) is true for all natural numbers n. 

Proof We must show that the collection C of all those natural numbers n for 
which the statement Sin) is false is empty. 

If, on the contrary, C is nonempty, then there is a first false statement Sim). 
Since 5(0) is true, by (i), we must have m > 1. This implies that m — 1 > 0, and 
so there is a statement 5(m — 1) [there is no statement 5(— 1)]. As m is the least 
criminal, m — 1 must be honest; that is, 5 ( in — 1) is true. But now (ii) says that 
5(m) = S{[m — 1] + 1) is true, and this is a contradiction. We conclude that C 
is empty and, hence, that all the statements S(n) are true. • 

We now show how to use induction. 

Proposition 1.5. 2” > n for all integers n > 0. 

Proof The nth statement 5 (n) is 

S{n) : 2 n > n. 

Two steps are required for induction, corresponding to the two hypotheses in 
Theorem 1.4. 

Base step. The initial statement 

5(0) : 2° > 0 


is true, for 2° = 1 > 0. 

4 Induction , having a Latin root meaning ‘to lead,” came to mean ‘prevailing upon to do 
something” or ‘tnfhencing.” This is an apt name here, for the nth statement infhences the 
(n + l)st one. 
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Inductive step. If 5 in) is true, then S(n + 1) is true; that is, using the inductive 
hypothesis S(n), we must prove 

S(n + 1) : 2' ,+1 >n+ 1. 

If 2” > n is true, then multiplying both sides of its inequality by 2 gives the 
valid 5 inequality: 

2" +1 = 2 x 2” > In. 

Now In = n + n > n + 1 (because n >1), and hence 2" +1 > In > n + 1, as 
desired. 

Having verified both the base step and the inductive step, we conclude that 
2" > n for all n >0. • 

Induction is plausible in the same sense that the Least Integer Axiom is plau- 
sible. Suppose that a given list 5(0), 5(1), 5(2), ... of statements has the prop- 
erty that S(n + 1) is true whenever 5 in) is true. If, in addition, 5(0) is true, 
then 5(1) is true; the truth of 5(1) now gives the truth of 5(2); the truth of 5(2) 
now gives the truth of 5(3); and so forth. Induction replaces the phrase and so 
forth by the inductive step which guarantees, for every n, that there is never an 
obstruction in the passage from any statement 5 in) to the next one, 5 ( n + 1). 

Here are two comments before we give more illustrations of induction. First, 
one must verify both the base step and the inductive step; verification of only 
one of them is inadequate. For example, consider the statements 5(h) : n 2 = n. 
The base step is true, but one cannot prove the inductive step (of course, these 
statements are false for all n > 1 ). Another example is given by the statements 
5 (n) : n = n + 1 . It is easy to see that the inductive step is true: if n = n + 1 , then 
Proposition A. 2 says that adding 1 to both sides gives n + 1 = (/? + 1 )+ 1 = n+2, 
which is the next statement, Sin + 1). But the base step is false (of course, all 
these statements are false). 

Second, when first seeing induction, many people suspect that the inductive 
step is circular reasoning: one is using S(n), and this is what one wants to prove! 
A closer analysis shows that this is not at all what is happening. The inductive 
step, by itself, does not prove that S(n + 1) is true. Rather, it says that if S(n) 
is true, then S(n + 1) is also true. In other words, the inductive step proves that 
the implication “If S(n ) is true, then Sin + 1) is true” is correct. The truth of 
this implication is not the same thing as the truth of its conclusion. For example, 
consider the two statements: “Your grade on every exam is 100%” and “Your 
grade in the course is A.” The implication “If all your exams are perfect, then you 
will get the highest grade for the course” is true. Unfortunately, this does not say 
that it is inevitable that your grade in the course will be A. Our discussion above 

5 See Proposition A. 2 in Appendix A, which gives the fi rst properties of inequalities. 
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gives a mathematical example: the implication “If n = n + 1, then n + 1 = n +2” 
is true, but the conclusion “n + 1 = « + 2” is false. 

Remark. The Least Integer Axiom is enjoyed not only by N, but also by any 
of its nonempty subsets Q (indeed, the proof of Proposition 1 . 1 uses the fact that 
the axiom holds for Q = {n in N : n > 2}). In terms of induction, this says 
that the base step can occur at any natural number k, not necessarily at k = 0. 
The conclusion, then, is that the statements S(n) are true for all n > k. The 
Least Integer Axiom is also enjoyed by the larger set Q m = {n in Z : n > m}, 
where m is any, possibly negative, integer. If C is a nonempty subset of Q m and 
if C fl {in , in + I ....,— I } 6 is nonempty, then this finite set contains a smallest 
integer, which is the smallest integer in C. If C Pi {m, w + 1, . . . , — 1 } is empty, 
then C is actually a nonempty subset of N, and the original axiom gives a smallest 
number in C. In terms of induction, this says that the base step can occur at any, 
possibly negative, integer k [assuming, of course, that there is a A: t h statement 
S(k)]. For example, if one has statements 5(— 1), 5(0), 5(1), . . ., then the base 
step can occur at n = —1; the conclusion in this case is that the statements S(n) 
are true for all n > — 1 . A 

Here is an induction with base step occurring at n = 1 . 

Proposition 1.6. 1+2 + •••+«. = \n(n + 1 )for every integer n > 1. 

Proof. The proof is by induction on n > 1 . 

Base step. If n = 1 , then the left side is 1 and the right side is ^1(1 + 1) = 1, 
as desired. 

Inductive step. It is always a good idea to write the (n + l)st statement 
S(n + 1) so one can see what has to be proved. Here, we must prove 

S(n + 1) : 1 + 2 + • • • + n i + (n + 1) = \{n + 1 )(n + 2). 

By the inductive hypothesis, i.e., using 5 in ) . the left side is 

[1 +2 +••• + «.] + («+ 1) = \n(n + 1) + (n + 1), 

and high school algebra shows that \n(n + 1) + (n + 1) = j(n + 1 )(n + 2). By 
induction, the formula holds for all n > 1 . • 

There is a story (it probably never happened) told about Gauss as a boy. 
One of his teachers asked the students to add up all the numbers from 1 to 100, 
thereby hoping to get some time for himself for other tasks. But Gauss quickly 

6 If C and D are subsets of a set X, then their intersection , denoted by C fl D, is the subset 
consisting of all those x in X lying in both C and D. 
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volunteered that the answer was 5050. He let s denote the sum of all the numbers 
from 1 to 100; s = 1 + 2 + • • - + 99 + 100. Of course, s = 100 + 99 + • • - + 2+ 1. 
Arrange these nicely: 


^ = 1 + 2 + • • • + 99 + 100 
s == 100 + 99 + • • • + 2 + 1 


and add: 


Is = 101 + 101 H b 101 + 101. 

the sum 101 occurring 100 times. We now solve: s = ^ (100 x 101) = 5050. 
This argument is valid for any number n in place of 100 (and it does not use 
induction). Not only does this give a new proof of Proposition 1.6, it also shows 
how the formula could have been discovered. 7 

It is not always the case, in an inductive proof, that the base step is very 
simple. In fact, all possibilities can occur: both steps can be easy; both can be 
difficult; one is harder than the other. 


Proposition 1.7. If we assume ( fg )' = f'g + fg', the product rule for deriva- 
tives, then 

(x") / = nx n ~ l for all integers n > 1. 

Proof We proceed by induction on n > 1 . 

Base step. If n = 1, then we ask whether (x)' = x° = 1, the constant 
function identically equal to 1. By definition, 


fix) = lim 
h-+ o 


fjx+h) 

h 


fix) 


When f{x) = x, therefore, 


, x + h — x h 

(x) = lim = lim — = 1. 

h—*0 h A— >0 h 


Inductive step. We must prove that (x " + 1 )' = in + l)x n . It is permissible 
to use the inductive hypothesis, ix”)' = nx" , as well as (x)' = 1, for the base 

7 Actually, this formula goes back at least a thousand years (see Exercise 1 . 10 on page 13). 
Alhazen (Ibn al-Haytham) (965-1039), found a geometric way to add 

1* + 2 k + ■ ■ ■ + n k 


for any fi xed integer k > 1 [see Exercise 1.11 on page 13]. 
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step has already been proved. Since x" +1 = x"x, the product rule gives 

(x n+1 )' = (x n x)' = (x”/x + x"(x/ 

= (nx" -1 )x+x" 1 = (n + l)x". 

We conclude that (x n )' = nx n ~ l is true for all n > 1 . • 

Here is an example of an induction whose base step occurs at n = 5. Con- 
sider the statements 

S(n) : 2" > n 2 . 

This is not true for small values of n: if n = 2 or 4, then there is equality, not 
inequality; if n = 3, the left side, 8, is smaller than the right side, 9. However, 
5(5) is true, for 32 > 25. 

Proposition 1.8. 2” > n 2 is true for all integers n > 5. 

Proof. We have just checked the base step 5 (5). In proving 
Sin + 1) : 2" +1 > in + l) 2 , 

we are allowed to assume that n > 5 (actually, we will need only n > 3 to prove 
the inductive step) as well as the inductive hypothesis. Multiply both sides of 
2" > n 2 by 2 to get 

2 n+l =2x2" > In 2 = n 2 + n 2 = n 2 + nn. 

Since n > 5, we have n > 3, and so 

nn > 3 n = 2 n + n > 2 n + 1 . 

Therefore, 

2" +1 > n 2 + nn > n 2 + 2 n + 1 = in + 1 ) 2 . • 

There is another version of induction, usually called the second form of 
induction, that is sometimes more convenient to use. 


Definition. The predecessors of a natural number n > 1 are the natural num- 
bers k with k < n, namely, 0. 1, 2, ...,/? — 1(0 has no predecessor). 

Theorem 1.9 (Second Form of Induction). Let Sin) be a family of state- 
ments, one for each natural number n, and suppose that: 

(i) 5(0) is true-, 

(ii) ifSik) is true for all predecessors k ofn, then Sin) is itself true. 

Then Sin) is true for all natural numbers n. 
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Proof. It suffices to show that there are no integers n for which Sin) is false; 
that is, the collection C of all positive integers n for which S in) is false is empty. 

If, on the contrary, C is nonempty, then there is a least criminal m : there is 
a first false statement Sim). Since 5(0) is true, by (i), we must have m > 1. As 
m is the least criminal, k must be honest for all k < m; in other words, Sik) 
is true for all the predecessors of m. Then, by (ii), S(m) is true, and this is a 
contradiction. We conclude that C is empty and, hence, that all the statements 
Sin) are true. • 

The second form of induction can be used to give a second proof of Theo- 
rem 1.2. As with the first form, the base step need not occur at 0. 

Theorem 1.10 (= Theorem 1.2). Every integer n > 2 is either a prime or a 
product of primes. 

Proof. 8 Base step. The statement is true when n = 2 because 2 is a prime. 

Inductive step. If n > 2 is a prime, we are done. Otherwise, n = ab, where 
2 < a < n and 2 < b < n. As a and b are predecessors of n, each of them is 
either prime or a product of primes: 

a = pp'p " ■ ■ • and b = qq’q" • • • , 

and so n = pp'p" ■ ■ ■ qq'q" • • • is a product of (at least two) primes. • 

The reason why the second form of induction is more convenient here is that 
it is more natural to use 5(a) and 5 ( b ) than to use 5 in — 1); indeed, it is not at 
all clear how to use S(n — 1). 

Here is a notational remark. We can rephrase the inductive step in the first 
form of induction: if S(n — 1) is true, then S(n) is true (we are still saying 
that if a statement is true, then so is the next statement). With this rephrasing, 
we can now compare the inductive steps of the two forms of induction. Each 
wants to prove Sin): the inductive hypothesis of the first form is Sin — 1); the 
inductive hypothesis of the second form is any or all of the preceding statements 
5(0), 5(1), . . ., Sin — 1). Thus, the second form appears to have a stronger 
inductive hypothesis. In fact, Exercise 1.21 on page 15 asks you to prove that 
both forms of mathematical induction are equivalent. 

The next result says that one can always factor out a largest power of 2 from 
any integer. 

Proposition 1.11. Every integer n > 1 has a unique factorization n = 2 k m, 
where k > 0 and m > 1 is odd. 

8 The similarity of the proofs of Theorems 1.2 and 1.10 indicates that the second form of 
induction is merely a variation of least criminal. 
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Proof. We use the second form of induction on n > 1 to prove the existence of 
k and nr, the reader should see that it is more appropriate here than the first form. 
Base step. If n = 1 , take k = 0 and m = 1 . 

Inductive step. If n > 1, then n is either odd or even. If n is odd, then take 
k = 0 and m = n. If n is even, then n = 2b. Because b < n, it is a predecessor 
of n, and so the inductive hypothesis allows us to assume S(b) : b = I' m, where 
i > 0 and m is odd. The desired factorization is n = 2b = 2 ' 11 m . 

The word unique means “exactly one.” We prove uniqueness by showing 
that if n = 2 k m = 2' nr , where both k and t are nonnegative and both m and m' 
are odd, then k = t and m = m' . We may assume that k > t. If k > t, then 
canceling 2 1 from both sides gives 2 k ~ t m = nr. Since k — t > 0, the left side 
is even while the right side is odd; this contradiction shows that k = t. We may 
thus cancel 2 k from both sides, leaving m = ml. • 

The ancient Greeks thought that a rectangular figure is most pleasing to the 
eye if its edges a and b are in the proportion 

a : b = b : (a + b). 

In this case, a(a+b) = b so that b 2 — ab— a 2 = 0; that is, (b/a) 2 — b/a — 1 =0. 
The quadratic formula gives b/a = \( 1 ± s/5). Therefore, 

b/a = a = ^(1 + \/5) or b/a = ft = ^(1 — s/5) . 

The number a, approximately 1.61803, is called the golden ratio. Since a is a 
root of x 2 — x — 1, as is f, we have 

a 2 = a + 1 and f 2 = f + 1. 

The reason for discussing the golden ratio is that it is intimately related to the 
Fibonacci sequence. 

Definition. The Fibonacci sequence Fq, F\, T), . . . is defined as follows: 

Fo =0, F\ = 1, and F n = F „ _ i + F „_ 2 for all integers n > 2. 

The Fibonacci sequence begins: 0, 1, 1, 2, 3, 5, 8, 13, . . . 

Theorem 1.12. If F„ denotes the nth term of the Fibonacci sequence, then for 
all n > 0, 

F n = j ? (a n -n, 

where a = ^(1 + \/5) and f = ^(1 — x/5). 
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Proof. We are going to use the second form of induction [the second form is 
the appropriate induction here, for the equation F n = F„-\ + F n -i suggests that 
proving S(n ) will involve not only S (/? — 1) but S{n — 2) as well]. 

Base step. The formula is true for n = 0 : - 4 = (a 0 — f () ) = 0 = Fo- The 
formula is also true for n = 1 : 

js< al -^=7S< a -ft 

=75 [2 (i + V5) (i-V 5)] 

=-^(V5) = 1 = F] . 

(We have mentioned both n = 0 and n = 1 because verifying the inductive 
hypothesis for F n requires our using the truth of the statements for both F n _ 1 
and F n — 2 - For example, knowing only that Fj = — /3 2 ) is not enough to 

prove that the formula for Ft, is true; one also needs the formula for F\ .) 
Inductive step. If n > 2, then 

F n = F n — \ + F ,i — 2 

= -^ 5 (a n - 1 - P"~ l ) + -L(a"~ 2 - /3 n ~ 2 ) 

= ->= [(a"" 1 + a »- 2 ) - (/1" _1 + 

= -j. [a"~ 2 (a + 1) - p n ~ 2 (p + 1)] 

= -±5 a n - 2 (a 2 )-p n - 2 (p 2 )] 

= ^(a n -p n ), 

because a + 1 = a 2 and ft + 1 = f 2 . • 

It is curious that the integers F n are expressed in terms of the irrational num- 
ber \/5. 

Corollary 1.13. If a = 1 + V5 then F„ > a"~ 2 for all integers n > 3. 

Remark. If n = 2, then Ft = 1 = o' 0 , and so there is equality, not inequality. 

◄ 

Proof. Base step. If n = 3, then F3 = 2 > «, for a ~ 1.618. 

Inductive step. We must show that F„+i > a" -1 . By the inductive hypoth- 
esis, 

Fn + 1 = F n + F n -\ > a n ~ 2 + a" -3 

= a”- 3 (a + l)=a”-V = a"- 1 . • 
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One can also use induction to give definitions. For example, we can define 
n facto rial. 11 denoted by n\, by induction on n > 0. Define 0! = 1, and if n\ is 
known, then define 

(n + 1)! = n\(n +1). 

One reason for defining 0! = 1 will be apparent in the next section. 


Exercises 

1.1 Find a formula for 1 + 3 + 5 + • • • + (In — 1), and use mathematical induction to 
prove that your formula is correct. (Inductive reasoning is used in mathematics to 
help guess what might be true. Once a guess has been made, it must still be proved, 
perhaps using mathematical induction, perhaps by some other method.) 

1.2 Find a formula for 1 + ^" =1 j\ j , and use induction to prove that your formula is 
correct. 

*1.3 (i) For any n > 0 and any r 1, prove that 

1 + r + r 2 + r 3 + ■ ■ ■ + r n = (1 — r n+1 )/( 1 — r). 

(ii) Prove that 

1 + 2 + 2 2 + • • • + 2 " = 2 ' ,+ 1 - 1 . 

1.4 Show, for all n > 1, that 10" leaves remainder 1 after dividing by 9. 

1.5 Prove that if 0 < a < b, then a n < b n for all n > 0. 

1.6 Prove that l 2 + 2 2 + ■■■+ n 2 = ^n(n + 1)(2 n + 1) = + ^n 2 + \n. 

1.7 Prove that l 3 + 2 3 H + « 3 = |n 4 + fn 3 + \n 2 . 

1.8 Prove that l 4 + 2 4 + • • • + « 4 = + |u 4 + j« 3 — j^n. 

1.9 (M. Barr) There is a famous anecdote describing a hospital visit of G. H. Hardy 
to Ramanujan. Hardy mentioned that the number 1729 of the taxi he had taken to 
the hospital was not an interesting number. Ramanujan disagreed, saying that it 
is the smallest positive integer that can be written as the sum of two cubes in two 
different ways. 

(i) Prove that Ramanujan’s statement is true. 

(ii) Prove that Ramanujan’s statement is false. 

*1.10 Derive the formula for ^" =1 i by computing the area ( n + l) 2 of a square with 
sides of length n + 1 using Figure 1.1. 

*1.11 (i) Derive the formula for Y2'l=i ' by computing the area n(n + 1) of a rect- 

angle with height n + 1 and base n, as pictured in Figure 1.2. 

(ii) (Alhazen) For fi xed k > 1, use Figure 1.3 to prove 

n n n i 

i=i i=i i=i p= l 


9 The term factor comes from the Latin ‘to make” or ‘to contribute”; the term factorial 
recalls that n\ has many factors. 
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Figure 1.1 

1 + 2 +••• + « = ^(« 2 — n) 
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Figure 1.2 

1 + 2 + • • • + n = jn(n — 1) 



Figure 1.3 Alhazan's Dissection 


(iii) 


Given the formula E"=t i 

for E" = i *' 2 - 


jn(n + 1), use part (ii) to derive the formula 


1.12 (i) Prove that 2" > n 3 for all n > 10. 

(ii) Prove that 2" > n A for all n > 17. 

(iii) If k is a natural number, prove that 2" > nr for all n > k 2 + 1. 

1.13 Around 1350. N. Oresme was able to sum the series E^Li «/ 2" by dissecting the 

region R in Figure 1.4 in two ways. Let A„ be the vertical rectangle with base 
and height n, so that area(A„) = n/ 2", and let B„ be horizontal rectangle with 
base + • • • and height 1. Prove that E/Ei w /2" = 2. 

*1.14 Let gi(x ), .... g n {x) be differentiable functions, and let f(x) be their product: 
f(x) = gi (x) • • • g n (y). Prove, for all integers n > 2, that the derivative 


n 

f\x) = ^2gl(x)--- gi-l(x)gi (x)gi + 1 (x) • • • g n (x). 

i = 1 
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Figure 1.4 Oresme’s Dissections 


1.15 Prove, for every n e N, that (1 + x) n > 1 + nx whenever x e R. and 1 + x > 0. 

1.16 Prove that every positive integer a has a unique factorization a = 3 k m, where 
k > 0 and m is not a multiple of 3. 

1.17 Prove that F n < 2" for all n > 0, where Fq, F\, F 2 , . ■ . is the Fibonacci sequence. 

1.18 If F„ denotes the nth term of the Fibonacci sequence, prove that 

m 

^ ] Fn — F m -\- 2 1. 

/ 7=1 

1.19 Prove that 4" +1 + 5 2 " -1 is divisible by 21 for all n > 1. 

1.20 For any integer n > 2, prove that there are n consecutive composite numbers. 
Conclude that the gap between consecutive primes can be arbitrarily large. 

*1.21 Prove that the fi rst and second forms of mathematical induction are equivalent; that 
is, prove that Theorem 1.4 is true if and only if Theorem 1.9 is true. 

*1.22 (Double Induction) Let S(m, n) be a doubly indexed family of statements, one for 
each m > 0 and n > 0. Suppose that 

(i) 5(0, 0) is true; 

(ii) if S(m, 0) is true, then S(m + 1, 0) is true; 

(iii) if S(m, n) is true for all m, then S(m, n + 1) is true for all m. 

Prove that S(m, n) is true for all m > 0 and n > 0. 

1.23 Use double induction to prove that 

(m + 1)" > mn 


for all m , n > 0. 
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1.2 Binomial Coefficients 

What is the pattern of the coefficients in the formulas for the powers (1 + x) n of 
the binomial I + x ? The first few such formulas are: 

(1 + x)° = 1 

(1 + x) 1 = 1 + lx 

(1 + x) 2 = 1 + 2x + lx 2 

(1 + x) 3 = 1 + 3x + 3x 2 + lx 3 

(1 + x) 4 = 1 + 4x + 6x 2 + 4x 3 + lx 4 . 

Figure 1.5, called Pascal’s triangle, after B. Pascal (1623-1662), displays 
an arrangement of the first few coefficients. Figure 1.6, a picture from China in 

1 

1 1 
1 2 1 
13 3 1 

1 4 6 4 1 

1 5 10 10 5 1 

1 6 15 20 15 6 1 

1 7 21 35 35 21 7 1 

Figure 1.5 


the year 1303, shows that the pattern of coefficients had been recognized long 
before Pascal was born. 

The expansion of (1 + x) n is an expression of the form 
Co + ClX + C2X^ + • • • + C„X n . 

The coefficients c r are called binomial coefficients , 10 L. Euler (1707-1783) 

10 Binomial, coming from the Latin bi, meaning ‘two,” and nomen, meaning 'tiame” or 
‘term,” describes expressions of the form a + b. Similarly, trinomial describes expressions of 
the form a + b + c, and monomial describes expressions with a single term. The word is used 
here because the binomial coeffi cients arise when expanding powers of the binomial 1 + x. 
The word polynomial is a hybrid, coming from the Greek poly meaning ‘fnany” and the Latin 
nomen', polynomials are certain expressions having many terms. 
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Figure 1.6 Pascal’s Triangle, China, ca. 1300 
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introduced the notation ( j) for them; this symbol evolved into which is 
generally accepted nowadays: 

^ ^ = coefficient c r of x r in (1 + x) n . 

Hence, 

o+*)"=eC V- 

r = 0 V 7 

The number is pronounced “n choose r” because it also arises in counting 
problems, as we shall see later in this section. 

Observe, in Figure 1.5, that an inside number (i.e., not a 1 on the border) 
of the ( n + l)th row can be computed by going up to the nth row and adding 
the two neighboring numbers above it. For example, the inside numbers in row 4 


13 3 1 

1 4 6 4 1 

be computed from row 3 as follows: 4 = 1+3, 6 = 3 + 3, and 4 = 3+1. Let 
us prove that this observation always holds. 

Lemma 1.14. For all integers n > I and all r with 0 < r < n + 1, 



Proof. We must show, for all n > 1 , that if 

(1 + X')" = CO + Cix + C 2 X~ + • • • + C n x n , 

then the coefficient of x r in (1 + x)" +1 is c r -\ + c r . Since q> = 1, 

(1 +x)" +1 = (1 +x)(l +x)" 

= (1 +x)"+x(l +x) n 

= (.CO + CIX + C2X + • • • + c„x") 

+ X (co + Cix + C2X“ + • • • + C„x n ) 

= (co + C lX + C 2 X 2 + • • • + C n x”) 

+ CoX + ClX“ + C2X^ + * * * + C n X n+l 

= 1 + (cq + Cl)x + (ci + C2)x“ + (C2 + C 3 )x^ + • • • . 
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Thus ), the coefficient of x r in (1 + x) w+1 , is 


Cy — 1 \ Cy 


r - 1 


Proposition 1.15 (Pascal). For all n > 0 and all r with 0 <r< n, 


r ) r\(n — r)\ 

Proof. We prove the proposition by induction on n > 0. 
Base step. 1 1 If n = 0, then 


= 0 !/ 0 ! 0 ! = 1 . 


Inductive step. Assuming the formula for (") for all r, we must prove 
f n + 1 \ (n + 1 ) ! 


r\(n + 1 — r)\ 


If r =0, then ("J 1 ) = 1 = (n + l)!/0!(« + 1 - 0)!; if r = n + 1, 
= 1 = ( n + !)!/(« + 1)!0!; if 0 < r < n + 1, we use Lemma 1.14: 


n + 1 
r 


n \ / n 

r~d + (r 

n\ 


n\ 


(r — 1)!(« — r + 1)! r\(n — r)\ 
n\ 


(r — l)!(n — r)j ( (n — r + 1) r) 
n\ /r + n — r + 1 \ 

! V r(n — r + 1) / 

-( ,l+l ) 

)! \r(n — r + 11 / 


(r — l)!(n — r)\ V r(n — r + 1) 
n\ / n + 1 


(r — 1)!(« — r)! Vr (n — r + 1 ) 

(n + 1)! 
r!(n+ 1 — r)\ 

Corollary 1.16. For any real number x and for all integers n > 0, 

n ! 

r\(i 

r = 0 v 7 r = 0 


<i+-)"=e :v=e 


r!(n — r)! 


-v . 


li 


This is one reason why 0! is defi ned to be 1. 


then 
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Proof. The first equation is the definition of the binomial coefficients, and the 
second equation replaces by the value given in Pascal’s theorem. • 


Corollary 1.17 (Binomial Theorem). For all real numbers a and b and for 
all integers n > 1, 


(«+*>-= e(:w=e( 


r = 0 


f^\rl(n-r) 


t)‘ 


Proof The result is trivially true when a = 0 (if we agree that 0° = 1). If 
a 0, set x = b /a in Corollary 1.16, and observe that 

/ b\n / a+b \ » (a + b) n 

\ + a) V a / a' 1 


Therefore, 


r = 0 v 7 


l\b r 





Remark. The binomial theorem can be proved without first proving Corol- 
lary 1.16; just prove the formula for {a + b) n by induction on n > 0. We have 
chosen the proof above for clearer exposition. A 

Here is a combinatorial interpretation of the binomial coefficients. Given a 
set X, an r-subset is a subset of X with exactly r elements. If X has n elements, 
denote the number of its r -subsets by 

that is, [n, r] is the number of ways one can choose r things from a box of n 
things. 

We compute [« , r] by considering a related question. Given an “alphabet” 
with n (distinct) letters and a number r with 1 < r < n, an r-anagram is a 
sequence of r of these letters with no repetitions. For example, the 2-anagrams 
on the alphabet a, b, c are 

ab, ba, ac, ca, be, cb 

(note that aa, bb, cc are not on this list). How many r- anagrams are there on an 
alphabet with n letters? We count the number of such anagrams in two ways. 

(1) There are n choices for the first letter; since no letter is repeated, there 
are only n — 1 choices for the second letter, only n — 2 choices for the third letter, 
and so forth. Thus, the number of r- anagrams is 

n(n — 1 )(« — 2) • - - ( n — [r — 1 ]) = n(n — l)(n — 2) • • • (n — r + 1). 
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Note the special case n = r: the number of n -anagrams on n letters is n ! . 

(2) Here is a second way to count these anagrams. First choose an r-subset 
of the alphabet (consisting of r letters); there are [n, r] ways to do this, for this 
is exactly what the symbol [n, r] means. For each chosen r-subset, there are r! 
ways to arrange the r letters in it (this is the special case of our first count when 
n = r). The number of r- anagrams is thus 

r\[n, r]. 

We conclude that 


r\[n, r ] = n(n — 1 )(n — 2) •••(« — r + 1), 
from which it follows, by Pascal’s formula, that 

[n, r] = n(n — 1 ){n — 2 )■■■(« — r + l)/r! = 

This is why the binomial coefficient ("] is often pronounced as “n choose r.” 

As an example, how many ways are there to choose 2 hats from a closet 
containing 14 different hats? (One of my friends does not like the phrasing of 
this question. After all, one can choose 2 hats with one’s left hand, with one’s 
right hand, with one’s teeth, . . . ; but I continue the evil tradition.) The answer is 
( 2 j, an d Pascal’s formula allows us to compute this as (14 x 13)/2 = 91. 

Our first interpretation of the binomial coefficients was algebraic, that 
is, as coefficients of polynomials which can be calculated by Pascal’s formula; 
our second interpretation is combinatorial, that is, as n choose r. Quite often, 
each interpretation can be used to prove a desired result. For example, here is a 
combinatorial proof of Lemma 1.14. Let X be a set with n + 1 elements, and 
let us color one of its elements red and the other n elements blue. Now d 1 j 
is the number of r-subsets of X. There are two possibilities for an r-subset Y: 
either it contains the red element or it is all blue. If Y contains the red element, 
then Y consists of the red element and r — 1 blue elements, and so the number of 
such Y is the same as the number of all blue (r — l)-subsets, namely, ( "j). The 

other possibility is that Y is all blue, and there are ( n ) such r-subsets. Therefore, 

C^ 1 ) = ir-l) + (")’ as desired - 

We are now going to apply the binomial theorem to trigonometry, but we 
begin by reviewing properties of the complex numbers. Recall that the modulus 
\z\ of a complex number z = a + ib is defined to be 

|z| = V a 2 + b 2 . 

If we identify a complex number z = a + ib with the point ( a , b) in the plane, 
then its modulus \z\ is the distance from z to the origin. It follows that every 
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complex number z of modulus 1 corresponds to a point on the unit circle, and 
so it has coordinates (cos#, sin#) for some angle # (in the right triangle OP A 
in Figure 1.7, we have cos# = \OA\/\OP\ = \OA\, because \OP\ = 1, and 
sin# = \PA\/\OP\ = \PA\). 



Proposition 1.18 (Polar Decomposition). Every complex number z, has a fac- 
torization 

z = r(cos# + i sin#), 
where r = \z\ > 0 and 0 < # < 2tt. 

Proof. If z = 0, then |z| = 0 and any choice of # works. If z = a +bi f 0, then 
|z| 7^ 0. Now z/|z| = a/\z\ + ib/\z\ has modulus 1, for (o/|z|) 2 + (b/|z|) 2 = 
( a 2 + b 2 )/\z\ 2 = 1. Therefore, there is an angle # with 

z 

— = cos# + i sin#, 

\z\ 

and so z = |z|(cos# + i sin#) = r( cos# + i sin#). • 

If 7 = a + ib = r (cos # + i sin #), then (r, #) are the polar coordinates 12 of 
z; this is the reason Proposition 1.18 is called the polar decomposition of z. 

The trigonometric addition formulas for cos(# + i//) and sin(# + f) have a 
lovely translation in the language of complex numbers. 

12 A pole is an axis about which rotation occurs. For example, the axis of the Earth has 
endpoints the North and South Poles. Here, we take the pole to be the z-axis (perpendicular 
to the plane). 
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Proposition 1.19 (Addition Theorem). If 

z = cos 6 + i sin 6 and w = cos xj/ + i sin xjr, 

then 

zw = cos (6 + i fr) + i sin(# + xj/). 

Proof. 

zw = (cos 0 + i sin 0) (cos xjr + i sin xjr) 

= (cos 9 cos xj/ — sin 9 sin xjr) + i (sin 9 cos xj/ + cos 9 sin x/z). 

The trigonometric addition formulas show that 

zw = cos (9 + xj/) + i sin(# + xj/). • 

The addition theorem gives a geometric interpretation of complex multipli- 
cation: if z = r( cos 9 + i sin0) and w = s{ cos xj/ + i sin xj/), then 

zw = rs[cos(9 + i fr) + i sin(0 + xj/)], 

and the polar coordinates of zw are 

( rs , 9 + xj/). 


Corollary 1.20. Ifz and w are complex numbers, then 

\zw\ = |z| |w|. 

Proof. If the polar decompositions of z and in are z = r(cos 9 + i sin 0 ) and 
w = s (cos xj / +i sin xj/), respectively, then we have just seen that |z| = r, |in| = s, 
and \zw\ = rs. • 

It follows from this corollary that if z and w lie on the unit circle, then their 
product zw also lies on the unit circle. 

In 1707, A. De Moivre (1667-1754) proved the following elegant result. 


Theorem 1.21 (De Moivre). For every real number x and every positive inte- 
ger n, 

cos (nx) + i sin(nv) = (cosx + i sinx)". 
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Proof. We prove De Moivre’s theorem by induction on n > 1. The base step 
n = 1 is obviously true. For the inductive step, 

(cosx + i sinx)” +1 = (cosx + i sinx)" (cosx + i sinx) 

= [cos(nx) + i sin (rax)] (cosx + i sinx) 
(inductive hypothesis) 

= cos (nx + x) + i sin(«x + x) 

(addition formula) 

= cos([« + l]x) + i sin([« + l]x). • 

Example 1.22. 

Let us find the value of (cos 3° + i sin 3°) 40 . By De Moivre’s theorem, 

(cos 3° + i sin3°) 40 = cos 120° + i sin 120° = — j + i^-. ◄ 

Here are the double and triple angle formulas. 

Corollary 1.23. 

(i) cos(2x ) = cos 2 x — sin 2 x = 2 cos 2 x — 1 
sin(2x) = 2 sinx cosx. 

(ii) cos(3x ) = cos 3 x — 3 cos x sin 2 x = 4 cos 3 x — 3 cos x 
sin(3x) = 3 cos 2 x sinx — sin 3 x = 3 sinx — 4 sin 3 x. 

Proof. 

(i) 

cos(2x) + i sin(2x) = (cosx + i sinx) 2 

= cos 2 x + 2 i sin x cos x + i 2 sin 2 x 
= cos 2 x — sin 2 x + i (2 sin x cos x) . 

Equating real and imaginary parts gives both double angle formulas. 

(ii) De Moivre’s theorem gives 

cos(3x) + i sin(3x) = (cosx + i sinx) 3 

= cos x + 3 i cos' x sin x + 3 i cos x sin' x + i sin x 
= cos 3 x — 3 cosx sin 2 x + i(3 cos 2 x sinx — sin 3 x). 

Equality of the real parts gives cos(3x) = cos 3 x — 3 cosx sin 2 x; the second 
formula for cos(3x) follows by replacing sin 2 x by 1 — cos 2 x. Equality of the 
imaginary parts gives sin(3x) = 3 cos 2 x sin x — sin 3 x = 3 sin x — 4 sin 3 x; the 
second formula arises by replacing cos 2 x by 1 — sin 2 x. • 
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Corollary 1.23 will be generalized in Proposition 1.24. If / 2 OO = 2a 2 — 1, 
then 

cos(2x) = 2cos 2 x — 1 = _/*2 (cos a) , 
and if fo{x) = 4a 3 — 3x, then 

cos(3x) = 4cos 3 x — 3 cosx = /^(cosx). 


Proposition 1.24. For all n > 1, there is a polynomial f n (x) having all coeffi- 
cients integers such that 

cos (nx) = /„( cosx). 

Proof. By De Moivre’s theorem, 

cos(nx) + i sin(nx) = (cosx + i sinx)" 

= ^ ^ 'j (cosx)” - ' (i sinx) r . 

r = 0 ' 

The real part of the left side, cos(nx), must be equal to the real part of the right 
side. Now i r is real if and only if 13 r is even, and so 


n 

cos (nx) = 

r even 


(cosx)” r (i sin x) r . 


If r = 2k, then i r = i 2k = (— l) fc , and 

L«/2J 

cos(wx) = (— \) k 

k = 0 

(L« /2J denotes the largest integer m with m < nf 2). 14 Butsin^x = (sin 2 x) k = 
(1 — cos 2 x) k , which is a polynomial in cosx. This completes the proof. • 

It is not difficult to show that /„ (x) begins with 2” -1 x". A sine version of 
Proposition 1.24 can be found in Exercise 1.31 on page 33. 

l3 The converse of an implication ‘If P is true, then Q is true” is the implication ‘If Q is 
true, then P is true.” An implication may be true without its converse being true. For example, 
‘If a = b, then a 2 = b~g The phrase if and only if means that both the statement and its 
converse are true. 

14 |xj . called the floor of x or the greatest integer in x, is the largest integer m with m < x. 
For example, [3J = 3 and \ n\ = 3. 


2k 


(cosx) 


n—2k 


sin 


2k 
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We are now going to present a beautiful formula discovered by Euler, but we 
begin by recalling some power series formulas from calculus to see how it arises. 
For every real number x. 


V •'V 

e = 1 + x + — ■ + • • • H + 


2 ! 

x 2 x 4 

cosx = 1 - — + — - 


(— l)“x 2 " 

H h 

(2 n)\ 


and 


3 5 

X X 

sin x = x 1 

3! 5! 


(— l)"x 2 ” +1 
(2n+ T)! 


+ ■ 


One can define convergence of any power series c„z n , where z and c„ 

are complex numbers, and one can show that the series 




z 

H + 

nl 


converges for every complex number z\ the complex exponential e z is defined 
to be the sum of this series. 

Euler’s Theorem. For all real numbers x, 

e iX = cosx + i sinx. 

Proof. ( Sketch ) Now 


e — 1 T i x T 


(ixf 

2 ! 


+ ••• + 


(ixf 


+ 


As n varies over 0, 1, 2, 3, . . ., the powers of i repeat every four steps: that is, 
i" takes values 

1, i, — 1, —i, 1, i, — 1, —i, 1, . . . . 

Thus, the even powers of ix do not involve i, whereas the odd powers do. Col- 
lecting terms, one has e' x = even terms + odd terms, where 


= 1 


(ix) 2 (ix) 4 


2 ! 

,2 


+ 


4! 


+ 


. x x 

= 1_ 2! +^-"- = c °s* 


even terms 
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and 


. ( ix ) 3 (ix) 5 

odd terms = ix H 1 

3! 5! 

3 5 


(XX \ 

= 3! + 5! -")= ,S1 


sinx. 


Therefore, e lx = cosx + i sinx. • 

It is said that Euler was especially pleased with the equation 

e ni = -1; 


indeed, this formula is inscribed on his tombstone. 

As a consequence of Euler’s theorem, the polar decomposition can be rewrit- 
ten in exponential form: Every complex number z has a factorization 


where r > 0 and 0 <6 < 2jt . 

The addition theorem and De Moivre’s theorem can be restated in complex 
exponential form. The first becomes 

e ix e 'y = e i(x+y ^\ 


the second becomes 


(e ix ) n = e inx . 


Definition. If n > 1 is an integer, then an nth root of unity is a complex number 
i; with = 1. 

Corollary 1.25. Every nth root of unity f is equal to 

piTzik/n _ cos ( 27 xk/n) + i sin(27T k/n), 
for some k with 0 < k < n — 1. 

Proof. If f = cos(27t/«) + i sin (2jt / /? ) . then De Moivre’s theorem, Theo- 
rem 1.21, gives 


f" = [cos(27r/n) + i sin(2 Jt/n)]" 
= cos(n2n/n ) + i s\n(n2n / n) 
= cos(27t) + i sin(27r) 

= 1 , 
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so that £ is an nth root of unity. Finally, if k is an integer, then £" = 1 implies 
(t; k ) n = (t; n ) k = 1 k = 1, and so l; k = cos(2nk/n ) + i sin(2 nk/n) is also an 
nth root of unity. 

Conversely, assume that f is an nth root of unity. By the polar decom- 
position, Proposition 1.18, we have £ = cos# + i sin# (because |f| = 1). 
By De Moivre’s theorem, 1 = £” = cos n# + i sin nO. Since cos# = 1 if 
and only if # = 2kn for some integer k, we have n# = 2kn; that is, f = 
cos(2 kn/n) + i sin(2for/n). It is clear that we may choose k so that 0 < k < n 
because cos x is periodic with period 2jt. • 

Corollary 1.20 states that |zu;| = \z\ \ w for any complex numbers z and w . 
It follows that if f is an nth root of unity, then 1 = |f"| = |f |", so that | = 1 
and t, lies on the unit circle. Given a positive integer n, let # = 2n /n and let 
f = e‘ e . The polar coordinates of f are (1, #), the polar coordinates oft; 2 are 
(1, 29), the polar coordinates of t; 3 are (1, 3#),. . . , the polar coordinates of f' !_1 
are (1, (n — 1)#), and the polar coordinates off" = 1 are (1, n#) = (1,0). Thus, 
the nth roots of unity are evenly spaced around the unit circle. Figure 1.8 shows 
the 8th roots of unity (here, # = 27 t/8 = jt/4). 


i 



Figure 1.8 8th Roots of Unity 


Just as there are two square roots of a number a, namely, y/a and — sfa, there 

are n different nth roots of a, namely, e l7T,k ^ n -^/a for k = 0, I n — 1. For 

example, the cube roots of unity are 1 , 

f = cos 120° + i sin 120° = — \ + i ^ 

and 

£ 2 = cos 240° +i sin 240° = — \ — i^. 

There are 3 cube roots of 2, namely, y/2, t) \J~2, and C 1 s/2. 
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Every nth root of unity is, of course, a root of the polynomial x" — 1. There- 
fore, 

x" - 1 = Y[ (* - £)• 
l 

If ( is an nth root of unity, and if n is the smallest positive integer for which 
f " = 1, we say that / is a primitive n th root of unity. For example, f = e lni ^ n 
is a primitive nth root of unity. Now i is an 8th root of unity, for i 8 = 1; it is not 
a primitive 8th root of unity, but it is a primitive 4th root of unity. 


Lemma 1.26. Let f be a primitive dth root of unity. Iff 11 = 1, then d must be 
a divisor ofn. 

Proof. By long division, n/d = q + r/d, where q and r are natural numbers 
and 0 < r/d < 1; that is, n = qd + r, where 0 < r < d. But 


1 £.n qd+r ^qd ^ r ^ r 


because f qd = f d ) q = 1. If r 0, we contradict d being the smallest exponent 
for which f d = 1. Hence, n = qd, as claimed. • 


Definition. If d is a positive integer, then the dth cyclotomic 15 polynomial is 
defined by 

= Y\(x - /), 

where / ranges over all the primitive dth roots of unity. 

In Proposition 3.47, we will prove that all the coefficients of db/Cx) are inte- 
gers. 

The following result is almost obvious. 


Proposition 1.27. For every integer n > 1, 

x n - 1 = Y\ <Mx), 

d\n 


where d ranges over all the positive divisors d of n [in particular, d>i(x) and 
d>„(x) occur], 

15 The roots of x" — 1 are the «th roots of unity: 1, f, f 2 f" -1 , where f = e 27n /« — 

cos(2tt /«) + /' sin(27r/«). Now these roots divide the unit circle {f e C : |z| = 1} into n equal 
arcs (see Figure 1.8). This explains the term cyclotomic, for its Greek origin means ‘fcircle 
splitting.” 
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Proof. In light of Corollary 1 .25, the proposition follows by collecting, for each 
divisor d of n, all terms in the equation x" — 1 = ]~[(x — C, ) with ( a primitive 
c/th root of unity. • 

For example, if p is a prime, then x p — 1 = <t> i (x ) <f > (x ) . Since 4> | (x ) = 
x — 1 , it follows that 


& p(x) — x p 1 + x p “ + •••+ x + 1 . 

Definition. The Euler f -function is the degree of the nth cyclotomic polyno- 
mial: 

fin) = deg(<b„(x)). 

In Proposition 1.39, we will give another description of the Euler 0-function 
that does not depend on roots of unity. 

Corollary 1.28. For every integer n > 1, we have 

n = y ^ fjd). 

d\n 

Proof. Note that fin) is the degree of <f>„ (x), and use the fact that the degree 
of a product of polynomials is the sum of the degrees of the factors. • 

Where do the names of the trigonometric functions come from? The cir- 



Figure 1.9 Etymology of 
Trigonometric Names 


cle in Figure 1.9 is the unit circle, and so the coordinates of the point A are 
(cos a, sin a); that is, \OD\ = cosa and \AD\ = sina. The reader may show 
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that \BC\ = tana (the Latin word tangere means “to touch,” and a tangent is 
a line which touches the circle in only one point), and that \ OB\ = sec a (the 
Latin word secare means “to cut,” and a secant is a line that cuts a circle). The 
complement of an acute angle a is 90° — a, and so the name cosine arises from 
that of sine because of the identity cos a = sin(90° — a). 

The reason for the term sine is more amusing. We see in Figure 1.9 that 


sin a = \AD\ = j\AE\\ 


that is, sin a is half the length of the chord AE. The fifth century Indian mathe- 
matician Aryabhata called the sine ardha-jya (half chord) in Sanskrit, which was 
later abbreviated to jya. A few centuries later, books in Arabic transliterated jya 
as jib a. In Arabic script, there are letters and diacritical marks; roughly speak- 
ing, the letters correspond to our consonants, while the diacritical marks corre- 
spond to our vowels. It is customary to suppress diacritical marks in writing; 
for example, the Arabic version of jib a is written jb (using Arabic characters, 
of course). Now jiba, having no other meaning in Arabic, eventually evolved 
into jaib, which is an Arabic word, meaning “bosom of a dress” (a fine word, but 
having absolutely nothing to do with half-chord). Finally, Gherardo of Cremona, 
ca. 1 150, translated jaib into its Latin equivalent, sinus. And this is why sine is 
so called, for sinus means bosom! 

As long as we are discussing etymology, why is a root so called? Just as 
the Greeks called the bottom side of a triangle its base (as in the area formula 
\ altitude x base), they also called the bottom side of a square its base. A natural 
question for the Greeks was: Given a square of area A, what is the length of its 
side? Of course, the answer is «J~A. Were we inventing a word for Va, we might 
have called it the base of A or the side of A. Similarly, consider the analogous 
three-dimensional question: given a cube of volume V, what is the length of its 
edge? The answer y/V might be called the cube base of V, and sf~A might then 
be called the square base of A. Why, then, do we call these numbers cube root 
and square rootl What has any of this to do with plants? 

Since tracing the etymology of words is not a simple matter, we only suggest 
the following explanation. Through about the fourth and fifth centuries, most 
mathematics was written in Greek, but, by the fifth century, India had become 
a center of mathematics, and important mathematical texts were also written in 
Sanskrit. The Sanskrit term for square root is pada. Both Sanskrit and Greek are 
Indo-European languages, and the Sanskrit word pada is a cognate of the Greek 
word podos', both mean base in the sense of the foot of a pillar or, as above, the 
bottom of a square. In both languages, however, there is a secondary meaning: 
the root of a plant. In translating from Sanskrit, Arab mathematicians chose the 
secondary meaning, perhaps in error (Arabic is not an Indo-European language), 
perhaps for some unknown reason. For example, the influential book by al- 
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Khwarizmi, Al-jabr w’al muqabala , 16 which appeared in the year 830, used the 
Arabic word jidhr, meaning root of a plant. (The word algebra is a European 
version of the first word in the title of this book; the author’s name has also 
come into the English language as the word algorithm .) This mistranslation has 
since been handed down thr ough the centuries; the term jidhr became standard 
in Arabic mathematical writings, and European translations from Arabic into 
Latin used the word radix (meaning root, as in radish or radical). The notation 
r2 for \Jl occurs in European writings from about the twelfth century (but the 
square root symbol did not arise from the letter r; it evolved from an old dot 
notation). However, there was a competing notation in use at the same time, for 
some scholars who translated directly from the Greek denoted Jl by 12, where 
l abbreviates the Latin word latus, meaning side. Finally, with the invention of 
logarithms in the 1500s, r won out over /, for the notation 12 was then commonly 
used to denote log 2. The passage from square root to cube root to the root 
of a polynomial equation other than x 2 — a and x 3 — a is a natural enough 
generalization. Thus, as pleasant as it would be, there seems to be no botanical 
connection with roots of equations. 


Exercises 


1.24 


Prove that the binomial theorem holds for complex numbers: if u and v are com- 
plex numbers, then 


(u + v) n = ("V" 


r = 0 


*1.25 Show that the binomial coeffi cients are Symmetric”: 



for all r with 0 < r < n. 

*1.26 Show, for every n, that the sum of the binomial coeffi cients is 2': 



1.27 (i) Show, for every n > 1, that the ‘alternating sum” of the binomial coeffi - 

cients is zero: 



•• • + (-!)" 



= 0 . 


16 One can translate this title from Arabic, but the words already had a technical meaning: 
both jcibr and muqabala refer to certain operations akin to subtracting the same number from 
both sides of an equation. 
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(ii) Use part (i) to prove, for a given n, that the sum of all the binomial co- 
effi cients with r even is equal to the sum of all those with r 
odd. 

1.28 Prove that if n > 2, then 


r= 1 



= 0 . 


*1.29 If 0 < r < n, prove that 

1.30 Let £i be complex numbers with \s j \ = 1 for all j, where n > 2. 

(i) Prove that 

n 

Sj\ <J2\ e j\ =n - 

j = 1 j = 1 

(ii) Prove that there is equality. 




\H e j\ =n ’ 

i= i 

if and only if all the sj are equal. 

*1.31 For all odd n > 1, prove that there is a polynomial g„ (x), all of whose coeffi cients 
are integers, such that 

sin(nx) = g„(sinv). 

1.32 (Star of David) Prove, for all n > r > 1, that 


In— 1\/ n \/n + l\_/n— 1\/ n \/n + l\ 

V-l/V+VV r r A'' - 1/ V+ 1/ 



1.33 (i) What is the coeffi cient of x 16 in (1 + x) 20 ? 

(ii) How many ways are there to choose 4 colors from a palette containing 
paints of 20 different colors? 

1.34 Give at least two different proofs that a set X with n elements has exactly 2" sub- 
sets. 

1.35 A weekly lottery asks you to select 5 different numbers between 1 and 45. At the 
week’s end, 5 such numbers are drawn at random, and you win the jackpot if all 
your numbers match the drawn numbers. What is your chance of winning? 
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Definition. Defi ne then th deri vative f n \x) of a function fix) inductively: set f (l>> (x) = 
fix ) and, if n > 0, defi ne f' ,+l \x ) = (f (n f(x). 

1.36 Assume that ‘term-by-term’’ differentiation holds for power series: if fix) = c o + 
c\x + C 2 X 2 + • • • + c„x n + • • • , then the power series for the derivative fix) is 

f'ix) = cj + 2 c 2 X + 3csx 2 -\ f nc„x n ~ l -\ . 

(i) Prove that /( 0) = Co- 

(ii) Prove, for all n > 0, that 

f M ix) = n!c„ + in + 1)!c„ + iy + x 2 g„ix), 
where g n (x) is some power series . 

(iii) Prove that c„ = f^’\x)(0)/nl for all n > 0. (Of course, this is Taylor’s 
formula.) 

*1.37 iLeibniz) A function / : M — ► M is called a C°°-function if it has an nth derivative 
f^ix) for every n > 0. Prove that if / and g are C°° -functions, then 

(fg) M (x) = J2 (")/ W M • g (n - k \x). 

k = 0 ^ ' 

1.38 (i) If z = a + ib 0, prove that 

1 a b 

z a 2 + b 2 a 2 + b 2 

(ii) If z lies on the unit circle, prove that = z. 

1.39 Find s/J. 

*1.40 (i) If z = r [cos 6 + i sin 6 ], show that 

w = f/r [cos(0/n) + i sin(0/u)] 
is an nth root of z, where r > 0. 

(ii) Show that every nth root of z has the form fw, where t; is a primitive 
nth root of unity and k = 0, 1, 2, . . . , n — 1. 

1.41 (i) Find >/8 + 15 i. 

(ii) Find all the fourth roots of 8 + 15/. 


1.3 Greatest Common Divisors 

This is an appropriate time to introduce notation for some popular sets of num- 
bers other than Z (denoting the integers) and N (denoting the natural numbers). 

Q = the set of all rational numbers (or fractions), that is, all numbers of the form 
a/b. where a and b are integers and b f 0 (after the word quotient) 
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K = the set of all real numbers 
C = the set of all complex numbers 

Long division involves dividing an integer b by a nonzero integer a, giving 


where q is an integer and 0 < r/a < 1 . We clear denominators to get a statement 
wholly in Z . 

Theorem 1.29 (Division Algorithm). Given integers a and b with a ^ 0, 
there exist unique integers q and r with 

b = qa + r and 0<r<|o|. 

Proof. We will prove the theorem in the special case in which a > 0 and b > 0; 
Exercise 1 .42 on page 5 1 asks the reader to complete the proof. Long division 
involves finding the largest integer q with qa < b, which is the same thing as 
finding the smallest nonnegative integer of the form b — qa. We formalize this. 

The set C of all nonnegative integers of the form b — na, where n > 0, is 
not empty because it contains b = b — 0a (we are assuming that b > 0). By the 
Least Integer Axiom, C contains a smallest element, say, r = b — qa (for some 
q > 0); of course, r > 0, by its definition. If r > a, then 


b — (q + 1 )a = b — qa — a = r — a >0. 


Hence, r—a = b—(q+l)a is an element of C that is smaller than r, contradicting 
r being the smallest integer in C. Therefore, 0 < r < a. 

It remains to prove the uniqueness of q and r. Suppose that b = qa + r = 
q' a + r' , where 0 < r, r' < a, so that 

(q — q')a = r' — r. 

We may assume that r' > r, so that r' — r > 0 and hence q — q' > 0. If q q', 
then q — q' > 1 (for q — q' is an integer); thus, since a > 0, 

(q — q')a > a. 

On the other hand, since r' < a. Proposition A. 2 gives 

r' — r < a — r < a. 

Therefore, (q — q')a > a and r' — r < a, contradicting the given equation 
(q — q')a = r' — r . We conclude that q = q' and hence r = r' . • 
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Definition. If a and b are integers with a 0. then the integers q and r oc- 
curring in the division algorithm are called the quotient and the remainderaftex 
dividing b by a. 

For example, there are only two possible remainders after dividing by 2, 
namely, 0 and 1. A number m is even if the remainder is 0; m is odd if the 
remainder is 1 . Thus, either m = 2 q or m = 2 q + 1 . 

Warning ! The division algorithm makes sense, in particular, when b is neg- 
ative. A careless person may assume that b and — b leave the same remainder 
after dividing by a, and this is usually false. For example, let us divide 60 and 
-60 by 7. 

60 = 7 • 8 + 4 and - 60 = 7 • (-9) + 3. 

Thus, the remainders after dividing 60 and —60 by 7 are different (see Exer- 
cise 1.77 on page 71). 

The next result shows that there is no largest prime. 

Corollary 1.30. There are infinitely many primes. 

Proof. ( Euclid ) Suppose, on the contrary, that there are only finitely many 
primes. If p\, pi, ■ ■ ■ , Pk is the complete list of all the primes, define M = 
(pi • • • pk) + 1. By Theorem 1.2, M is either a prime or a product of primes. But 
M is neither a prime (M > p\ for every i) nor does it have any prime divisor 
Pi, for dividing M by /), gives remainder 1 and not 0. For example, dividing 
M by pi gives M = p \ ( pi ■ ■ ■ pk) + 1, so that the quotient and remainder are 
q = p 2 ■ ■ ■ p^ and r = 1; dividing M by pi gives M = P 2 (PiP 3 ■ ■ • Pk) + 1, so 
that q — pips ■ ■ ■ pk and r = 1 ; and so forth. The assumption that there are only 
finitely many primes leads to a contradiction, and so there must be an infinite 
number of them. • 

An algorithm solving a problem is a set of directions which gives the correct 
answer after a finite number of steps, never at any stage leaving the user in doubt 
as to what to do next. The division algorithm is an algorithm in this sense: one 
starts with a and b and ends with q and r. The appendix at the end of the book 
treats algorithms more formally, using pseudocodes, which are general directions 
that can easily be translated into a programming language. For example, here is 
a pseudocode for the division algorithm. 

1: Input: b > a > 0 
2: Output: q, r 
3: q := 0; r := b 
4: WHILE r > a DO 
5 : r := r — a 

6: q := q + 1 

7: END WHILE 
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Definition. If a and b are integers, then a is a divisor of b if there is an integer 
d with b = ad (synonyms are a divides b and also b is a multiple of a). We 
denote this by 

a | b. 

Note that 3 | 6, because 6 = 3x2, but that 3 \ 5 (that is, 3 does not divide 5): 
even though 5 = 3 x |, the fraction 4 is not an integer. The numbers ±1 and ±b 
are divisors of any integer b. We always have b | 0 (because 0 = b x 0); on the 
other hand, if 0 | b, then b = 0 (because there is some d with b = 0 x d = 0). 

If a and b are integers with a /0, then a is a divisor of b if and only if the 
remainder r given by the division algorithm is 0. If a is a divisor of b, then the 
remainder r given by the division algorithm is 0; conversely, if the remainder r 
is 0, then a is a divisor of b. 


Definition. A common divisor of integers a and b is an integer c with c \ a and 
c | b. The greatest common divisor of a and b. denoted by gcd(o, b) [or, more 
briefly, by {a, b)], is defined by 

gcd(o.i>)= \ 0ita = 0 = b 

I the largest common divisor of a and b otherwise. 

The notation ( a , b) for the gcd is, obviously, the same notation used for the 
ordered pair. The reader should have no difficulty understanding the intended 
meaning from the context in which the symbol occurs. 

If a and in are positive integers with a \ m , say, m = ab, we claim that 
a < m. Since 0 < b, we have I < b, because b is an integer, and so a < ab = m. 
It follows that gcd’s always exist. 

If c is a common divisor of a and b, then so is — c. Since one of ±c is 
nonnegative, the gcd is always nonnegative. It is easy to check that if at least one 
of a and b is nonzero, then (a, b) > 0. 


Proposition 1.31. If p is a prime and b is any integer, then 


( P,b ) 


P if P I b 
1 otherwise. 


Proof A common divisor c of p and a is, of course, a divisor of p. But the 
only positive divisors of p are p and 1, and so (p.a ) = p or 1; it is p if p \ a, 
and it is 1 otherwise. • 
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Definition. A linear combination of integers a and b is an integer of the form 

sa + tb, 


where s and t are integers. 

The next result is one of the most useful properties of gcd’s. 

Theorem 1.32. If a and b are integers, then gcd(o, b ) is a linear combination 
of a and b. 

Proof We may assume that at least one of a and b is not zero (otherwise, the 
gcd is 0 and the result is obvious). Consider the set I of all the linear combina- 
tions: 

I = + tb : s, t in 7L}. 

Both a and b are in I (take s = I and t = 0 or vice versa). It follows that I 
contains positive integers (if a f=- 0, then / contains ±a), and hence the set P of 
all those positive integers that lie in I is nonempty. By the Least Integer Axiom, 
P contains a smallest positive integer, say, d, which we claim is the gcd. 

Since d is in I, it is a linear combination of a and b: there are integers s and 
t with 

d = sa + tb. 

Let us show that d is a common divisor by trying to divide each of a and b 
by d. The division algorithm gives a = qd + r, where 0 < r < d. If r > 0, then 

r = a — qd = a — q(sa + tb) = (1 — qs)a + ( —qt)b is in P, 

contradicting d being the smallest element of P. Hence r = 0 and d \ a: a 
similar argument shows that d \ b. 

Finally, if c is a common divisor of a and b, then a = ca' and b = cb', so 
that c divides d, for d = sa + tb = c(sa' + tb’). But if c \ d, then |c| < d, and 
so d is the gcd of a and b. • 

If d = gcd(o, b) and if c is a common divisor of a and b, then c < d. The 
next theorem shows that more is true: c \ d for every common divisor c. 

Corollary 1.33. Let a and b be integers. A nonnegative common divisor d is 
their gcd if and only if c \ d for every common divisor c. 

Proof. Necessity (i.e., the implication =>) That every common divisor c of a 
and b is a divisor of d = sa + tb. has already been proved at the end of the proof 
of Theorem 1.32. 

Sufficiency (i.e., the implication 4=) Let d denote the gcd of a and b. and let 
d r be a nonnegative common divisor divisible by every common divisor c. Thus, 
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d' < d, because c < d is for every common divisor c. On the other hand, d itself 
is a common divisor, and so d \ d ' , by hypothesis. Hence, d < d', and so d = d' . 


The proof of Theorem 1.32 contains an idea that will be used again. 

Corollary 1.34. Let I be a subset ofL such that 

(i) 0 is in 7; 

(ii) if a and b are in 1, then a — b is in 7; 

(iii) if a is in 7 and q is in Z, then qa is in I. 

Then there is a nonnegative integer d in I with I consisting precisely of all the 
multiples of d. 

Proof If I consists of only the single integer 0, take d = 0. If I contains a 
nonzero integer a, then (— 1 )a = —a is in 7, by (iii). Thus, I contains ±a, one 
of which is positive. By the Least Integer Axiom, 7 contains a smallest positive 
integer; call it d. 

We claim that every element a in 7 is a multiple of d. The division algorithm 
gives integers q and r with a = qd + r, where 0 < r < d. Since d is in 7, so is 
qd, by (iii), and so (ii) gives r = a — qd in 7. But r < d, the smallest positive 
element of 7, and so r = 0; thus, a is a multiple of d. • 

The next result is of great interest, for it gives one of the most important 
characterizations of prime numbers. Euclid’s lemma is used frequently (at least 
ten times in this chapter alone), and an analog of it for irreducible polynomials 
is equally important. Looking further ahead, this lemma motivates the notion of 
prime ideal. 

Theorem 1.35 (Euclid’s Lemma). If p is a prime and p \ ab, then p \ a or 
p | b. More generally, if a prime p divides a product a \a 2 ■ • • a n , then it must 
divide at least one of the factors a,-. Conversely, ifm > 2 is an integer such that 
m | ab always implies m \ a or m \ b, then m is a prime. 

Proof. Assume that p \ a; that is, p does not divide a; we must show that 
p | b. Now the gcd ( p , a) = 1, by Proposition 1.31. By Theorem 1.32, there are 
integers ,v and t with 1 = sp + ta, and so 

b = spb + tab. 

Since p \ ab, we have ab = pc for some integer c, so that b = spb + tpc = 
p(sb + tc ) and p \ b. The second statement now follows easily by induction on 
n > 2. 
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We prove the contrapositive: if m is composite, then there is a product ab 
divisible by m, yet neither factor is divisible by m. Since m is composite, m = 
ab, where a < m and b < m. Thus, m divides ah. but m divides neither factor 
(if m | a, then m < a). • 

Here is a concrete illustration showing that Euclid’s lemma is not true in 
general: 6 | 12 = 4x3, but 6 j 4 and 6 j 3. 


Proposition 1.36. If p is a prime, then p 
Proof. Recall that 


P \ for 0 < j < p. 


( p \ = p] = p ( p ~ 1 > • • • ( P ~ J + 12 

\j) j'-(p-j ) ! j'- 

Cross multiplying gives 





= P(P 


1) • • • (P ~ j + 1), 


so that p | j ! (^) . If p | j !, then Euclid’s lemma says that p would have to divide 
some factor 1, 2, of / !. Since 0 < j < p, each factor of j ! is strictly less 
than p, and so p is not a divisor of any of them. Therefore, p \ j\. As p \ j ! ( p .), 
Euclid’s lemma now shows that p must divide (p. • 

Notice that the assumption that p is prime is needed; for example, = 6, 
but 4 { 6. 


Definition. Call integers a and b relatively prime if their gcd is 1 . 

Thus, a and b are relatively prime if their only common divisors are ±1; 
moreover, 1 is a linear combination of a and b. For example, 2 and 3 are rela- 
tively prime, as are 8 and 15. 

Here is a generalization of Euclid’s lemma having the same proof. 


Corollary 1.37. Let a, b, and c be integers. If c and a are relatively prime and 
if c | ab, then c \ b. 

Proof. By hypothesis, ab = cd for some integer d. There are integers .v and t 
with 1 = sc + ta, and so b = scb + tab = scb + ted = c(sb + td). • 

We see that it is important to know proofs: Corollary 1.37 does not follow 
from the statement of Euclid’s lemma, but it does follow from its proof. 
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Definition. An expression a lb for a rational number (where a and b are inte- 
gers) is in lowest terms if a and b are relatively prime. 


Lemma 1.38. Every nonzero rational number r has an expression in lowest 
terms. 

Proof. Since r is rational, r = a /b for integers a and b. If d = {a, b ), then 
a = a'd, b = b'd, and a/b = a'd/b'd = a' /b' . But (a', b') = 1, for if d' > 1 
is a common divisor of a' and b', then d'd > d is a larger common divisor of a 
and b. • 

Here is a description of the Euler ^-function that does not mention cyclo- 
tomic polynomials. 


Proposition 1.39. If n > 1 is an integer, then <f>(n) is the number of integers k 

with 1 < k < n and ( k , n ) = 1. 

Proof. It suffices to prove that e 2 jl,k l n is a primitive nth root of unity if and 
only if k and n are relatively prime. 

If k and n are not relatively prime, then n = dr and k = ds, where d, r, 
and ^ are integers, and d > 1; it follows that r < n. Hence, | so 

that ( e 27Tlk l n y = ( e 27Tls l r y = | . and hence e l7Zlk / n i s not a primitive nth root of 
unity. 

Conversely, suppose that f = e l7T,k l n is not a primitive nth root of unity. 
Lemma 1.26 says that f must be a dth root of unity for some divisor d of n with 
d < n ; that is, there is 1 < m < d with 

finikin ^2n im/d ^hzimr/dr ^2 nimr/n 

Since both k and mr are in the range between 1 and n, it follows that k = mr (if 
0 < x, y < 1 and e 2lTlx = e 2niy , then x = _y); that is, r is a divisor of k and of n, 
and so k and n are not relatively prime. • 


Proposition 1.40. \/2 is irrational. 

Proof. Suppose, on the contrary, that \/2 is rational; that is, \fl = a /b. We 
may assume that a/b is in lowest terms; that is, ( a , b) = 1. Squaring, a 2 = 2b 2 . 
By Euclid’s lemma 17 , 2 [ a, so that 2m = a, hence 4m 2 = a 2 = 2b 2 , and 
2m 2 = b 2 . Euclid’s lemma now gives 2 | b, contradicting (a, b) = I . • 

l7 This proof can be made more elementary; one needs only Proposition 1.11. 
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This last result is significant in the history of mathematics. The ancient 
Greeks defined number to mean positive integer, while (positive) rational num- 
bers were viewed as “ratios” a : b (which we can interpret as fractions a //;). 
That \fl is irrational was a shock to the Pythagoreans (around 600 BC), for it 
told them that s/2 could not be defined in terms of numbers (positive integers) 
alone. On the other hand, they knew that the diagonal of a square having sides 
of length 1 has length s/2. Thus, there is no numerical solution to the equation 
x 2 = 2, but there is a geometric solution. By the time of Euclid, (around 325 BC), 
this problem was resolved by splitting mathematics into two different disciplines: 
algebra and geometry. This resolution is probably one of the main reasons that 
the golden age of classical mathematics declined in Europe after the rise of the 
Roman Empire. For example, there were geometric ways of viewing addition, 
subtraction, multiplication, and division of segments (see Theorem 4.46), but it 
was virtually impossible to do any algebra. A sophisticated geometric argument 
(due to Eudoxus and given in Euclid’s Elements ) was needed to prove the version 
of cross-multiplication saying that if a : b = c : d, then a : c = b : d. 

We quote van der Waerden, Science Awakening, page 125: 

Nowadays we say that the length of the diagonal is the “irrational 
number” s/2, and we feel superior to the poor Greeks who “did 
not know irrationals.” But the Greeks knew irrational ratios very well 
. . . That they did not consider s/2 as a number was not a result of 
ignorance, but of strict adherence to the definition of number. Arith- 
mos means quantity, therefore whole number. Their logical rigor did 
not even allow them to admit fractions; they replaced them by ratios 
of integers. 

For the Babylonians, every segment and every area simply repre- 
sented a number . . . When they could not determine a square root 
exactly, they calmly accepted an approximation. Engineers and nat- 
ural scientists have always done this. But the Greeks were concerned 
with exact knowledge, with “the diagonal itself,” as Plato expresses 
it, not with an acceptable approximation. 

In the domain of numbers (positive integers), the equation x 2 = 2 
cannot be solved, not even in that of ratios of numbers. But it is 
solvable in the domain of segments; indeed the diagonal of the unit 
square is a solution. Consequently, in order to obtain exact solutions 
of quadratic equations, we have to pass from the domain of num- 
bers (positive integers) to that of geometric magnitudes. Geometric 
algebra is valid also for irrational segments and is nevertheless an 
exact science. It is therefore logical necessity, not the mere delight 
in the visible, which compelled the Pythagoreans to transmute their 
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algebra into a geometric form. 

Even though the Greek definition of number is no longer popular, their di- 
chotomy still persists. For example, almost all American high schools teach one 
year of algebra followed by one year of geometry, instead of two years in which 
both subjects are developed together. The problem of defining number has arisen 
several times since the classical Greek era. In the 1500s, mathematicians had to 
deal with negative numbers and with complex numbers (see our discussion of 
cubic polynomials in Chapter 5); the description of real numbers generally ac- 
cepted today dates from the late 1800s. There are echos of ancient Athens in our 
time. L. Kronecker (1823-1891) wrote, 

Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Men- 
schenwerk. 

God created the integers; everything else is the work of Man. Even today some 
logicians argue for a new definition of number. 

Our discussion of gcd’s is incomplete. What is the gcd (12327, 2409)? To 
ask the question another way, is the expression 2409/12327 in lowest terms? The 
next result not only enables one to compute gcd’s efficiently, it also allows one 
to compute integers s and t expressing the gcd as a linear combination 18 . Before 
giving the theorem, consider the following example. Since (2, 3) = 1, there are 
integers s and t with I = 2s + 3f. A moment’s thought gives s = — 1 and t = 1; 
but another moment’s thought gives s = 2 and t = — 1. We conclude that the 
coefficients ,v and t expressing the gcd as a linear combination are not uniquely 
determined. The algorithm below, however, always picks out a particular pair of 
coefficients. 


Theorem 1.41 (Euclidean Algorithm). Let a and b be positive integers. There 
is an algorithm that finds the gcd d = (a. b), and there is an algorithm that finds 
a pair of integers s and t with d = sa + tb. 

Remark. The general case for arbitrary a and b follows from this, for (a, b) = 
(I a\,\b\). ◄ 

Proof. The idea is to keep repeating the division algorithm (we will show where 
this idea comes from after the proof is completed). Let us set b = ro and a = r\. 

18 Every positive integer is a product of primes, and this is used, in Proposition 1.52, to 
compute gcd's. However, fi nding prime factorizations of large numbers is notoriously diffi cult; 
indeed, it is the basic reason why public key cryptography is secure. 
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Repeated application of the division algorithm gives integers q/, positive integers 
i'i . and equations: 

b = qia + r 2 , 
a = r\ = q 2 r 2 + n, 

r 2 = <l3 r 3 + r 4i 

r n - 3 = qn-2r„-2 + r„-\, 
n,-2 = +r n , 

r n — 1 = q>i r n 

(remember that all qj and rj are explicitly known from the division algorithm). 
Notice that there is a last remainder; the procedure stops because the remainders 
form a strictly decreasing sequence of nonnegative integers (indeed, the number 
of steps needed is less than a. Proposition 1.43 gives a smaller bound on the 
number of steps). 

We use Corollary 1.33 to show that the last remainder d = r n is the gcd. Let 
us rewrite the top equations of the euclidean algorithm without subscripts. 

b = qa + r 
a = q'r + 5. 

If c is a common divisor of a and b. then the first equation shows that c \ r. 
Going down to the second equation, we now know that c \ a and c \ r, and 
so c \ s. Continuing down the list, we see that c divides every remainder; in 
particular, c \ d. 

Let us now rewrite the bottom equations of the euclidean algorithm without 
subscripts. 

/ = ug + h 
g = u'h + k 
h = u"k + d 
k = vd. 

Going from the bottom up, we have d \ k and d \ d. so that d \ h: going up again, 
d \ h and d \ k imply d \ g. Working upwards ultimately gives d \ a and d \ b. 
We conclude that d is a common divisor. But d = ( a,b ) because we saw, in the 
preceding paragraph, that if c is any common divisor, then c \ d. 

We now find .v and t, again working from the bottom up. Rewrite the equa- 
tion h = u "k + d as d = h — u"k. Substituting in the next equation above, 

d = h — u"k = h — u"(g — u'h ) = (1 + u"u')h — u"g, 


r 2 < a 

r 3 < r 2 
r 4 < r 3 

r n—1 < r n—2 

r n < r„_ i 



Greatest Common Divisors 45 


so that d is a linear combination of g and h. Continue this procedure, replacing 
/; by / — ug, and so on, until d is written as a linear combination of a and b. • 

We say that n is the number of steps in the Euclidean algorithm, for one does 
not know whether r n in the (n — l)st step r n - 2 = q n -\r n -\ + r n is the gcd until 
the division algorithm is applied to r„_ 1 and r n . 

Example 1.42. 

Find (326, 78), express it as a linear combination of 326 and 78, and write 
78/326 in lowest terms. 


326 

= 4 x 0] + 

■OH 

(1) 

78 

= 5 x[0 + 

0 

(2) 

14 

= 1 x 0] + 

0 

(3) 

8 

= 1 x 0] + 

0 

(4) 

6 

= 3x0. 


(5) 


The Euclidean algorithm gives (326, 78) = 2. 

We now express 2 as a linear combination of 326 and 78, working from the 
bottom up using the equations above. 


2 = 0-10 by Eq.(4) 

= 0-1 ([0-10) by Eq.(3) 

= 20-1 


14 


= 2 78 -5 


-50]) 


- 1 


14 


by Eq.(2) 


= 2 78 - 11 14 


= 2 78 — 11 


326 


-4 


0]) by Eq.(l) 


= 460] - 1 106 


thus, s = 46 and t = —11. 

Dividing numerator and denominator by the gcd, namely, 2, gives 78/326 = 
39/163, and the last expression is in lowest terms. ◄ 

The Greek terms for the Euclidean algorithm are antanairesis or anthy- 
phairesis, either of which may be freely translated as “back and forth subtrac- 
tion.” Exercise 1.56 on page 52 says that (b, a) = (b — a, a). If b — a > a, 
repeat to get ( b , a) = (b — a, a) = (b — 2a, a). Keep subtracting until a pair a 
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and b — qa (for some q) is reached with b — qa < a. Thus, if r = b — qa, where 
0 < r < a, then 

(. b , a) = (b — a, a) = (b — 2a, a) = ■ ■ ■ = (b — qa, a) = ( r , a). 

Now change direction: repeat the procedure beginning with the pair ( r , a) = 
(a, r), for a > r; eventually one reaches id. 0) = d. 

For example, antanairesis computes the gcd (326, 78) as follows: 

(326, 78) = (248, 78) = (170, 78) = (92, 78) = (14, 78). 

So far, we have been subtracting 78 from the other larger numbers. At this point, 
we now subtract 14 (this is the reciprocal aspect of antanairesis), for 78 > 14. 

(78, 14) = (64, 14) = (50, 14) = (36, 14) = (22, 14) = (8, 14). 

Again we change direction: 

(14, 8) = (6,8). 

Change direction once again to get (8,6)= (2,6), and change direction one last 
time to get 

(6, 2) = (4, 2) = (2, 2) = (0, 2) = 2. 

Thus, gcd (326, 78) = 2. 

The division algorithm (which is just iterated subtraction!) is a more efficient 
way of performing antanairesis. There are four subtractions in the passage from 
(326, 78) to (14, 78); the division algorithm expresses this as 

326 = 4 x 78 + 14. 

There are then five subtractions in the passage from (78, 14) to (8, 14); the divi- 
sion algorithm expresses this as 

78 = 5 x 14 + 8. 

There is one subtraction in the passage from (14, 8) to (6, 8): 

14 = 1 x 8 + 6. 

There is one subtraction in the passage from (8, 6) to (2, 6): 

8 = 1 x 6 + 2, 

and there are three subtractions from (6, 2) to (0, 2) = 2: 

6 = 3x2. 

These are the steps in the Euclidean algorithm. 

The Euclidean algorithm was one of the first algorithms for which an explicit 
bound on the number of its steps in a computation was given. The proof of this 
involves the Fibonacci sequence 

Fq = 0, F\ = 1, F n = F„_ i + F„_ 2 for all n > 2. 
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Proposition 1.43 (Lame’s 19 Theorem). Let b > a be positive integers, and 
let 8(a) be the number of digits in the decimal expression of a. Ifn is the number 
of steps in the Euclidean algorithm computing the gcd (b, a), then 

n < 58(a). 

Proof Let us denote b by ro and a by r\ in the equations of the euclidean 
algorithm on page 44, so that every equation there has the form 

r j =rj+\qj+\+r j+ 2 


except the last one, which is 

r, j-l = r n q n . 

Note that q n > 2: if q n < 1, then r„_i < q n r n - 1 = r„, contradicting r„ < r„_ 
Similarly, all q \ , <72 , . . . , q n - 1 > 1: otherwise qj = 0 for some j < n — 1, and 
rj - 1 = rj+i, contradicting the strict inequalities r n < r„_ 1 < • • • < r\ = b. 
Now 

r n > 1 = F> 

and, since q n > 2, 


Gj-1 = r n q n > 2r„ > 2F 2 > 2 = F 3 . 
More generally, let us prove, by induction on j > 0, that 

r n-j > Fj+2- 


The inductive step is 

r n—j— 1 = r n—jdn—j 4 r n-;’+ 1 

— r n—j + r n- 7 +l (since q n -j — 1) 

> Fj+2 + Fj + 1 = Fj+j- 

We conclude that a = rj = > F„_i + 2 = F„+i. By Corollary 1.13, 

F„+i > a" -1 , where a = ^(1 + \/5), and so 


l9 This is an example in which a theorem's name is not that of its discoverer. Lame’s proof 
appeared in 1844. The earliest estimate for the number of steps in the Euclidean algorithm can 
be found in a rare book by Simon Jacob, published around 1564. There were also estimates 
by T. F. de Lagny in 1733, A-A-L Reynaud in 1821, E. Leger in 1837, and P-J-E Finck in 
1841. (This earlier work is described in articles of P. Shallit and P. Schreiber, respectively, in 
the journal Historica Mathematical) 
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Now log |Q a ~ log 10 (1.62) ~ .208 > so that 

log 10 o > ( n - 1) log 10 a > ( n - l)/5; 

that is, 

n — 1 <5 log 10 a < 58(a), 

because 8(a) = |_log 10 aj + and so n < 58(a) because 8(a), hence 58(a), is 
an integer. • 

For example, Lam 'e’s theorem guarantees there are at most 10 steps needed 
to compute (326, 78) (actually, there are 5 steps). 

The usual notation for the integer 5754 is an abbreviation of 
5 x 10 3 + 7 x 10 2 + 5 x 10 + 4. 

The next result shows that there is nothing special about the number 10; any 
integer b >2 can be used instead of 10. 

Proposition 1.44. Ifb > 2 is an integer, then every positive integer m has an 
expression in base b: there are integers dj with 0 < d, < b such that 

ni — dkb + dk~\b + * * * + do', 

moreover, this expression is unique if dk 0. 

Remark. The numbers dk, .... do are called the b-adic digits of m. A 

Proof. Let m be a positive integer; since b > 2, there are powers of b larger 
than m. We prove, by induction on k > 0, that if b k < m < b k+l , then m has an 
expression 

m = dkb + dk~\b + * * • + do 

in base b. 

If k = 0, then 1 = b° < m < b 1 = b, and we may define do = m. If k > 0, 
then the division algorithm gives integers dk and r with m = d+/ + r, where 
0 < r < b k . Notice that dk < b (lest m > b k+l ) and 0 < d/ ; (lest m < b k ). 
If r = 0, define do = ■■■ = dk - 1 = 0, and m = dkb k is an expression in base 
b. If r > 0, then the inductive hypothesis shows that r and, hence, m has an 
expression in base b. 

Before proving uniqueness of the /?-adic digits dj, we first observe that if 
0 < dj < b for all i, then 

k 

J2 d i bi < h k+l '■ 

<=0 


( 6 ) 
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k k k k 

J^dib 1 < Y^ib - \)b‘ = Y b>+X - Y b ‘ = bk+ ' ~ 1 < bk+1 - 

i = 0 i=0 i = 0 i = 0 

We now prove, by induction on k > 0, that if b k < m < b k+l , then the 
b - adic digits di in the expression m = Yi = o dib' are uniquely determined by m. 
Let m = Yi=o dib‘ = Yi=o c i b ‘ ■ where 0 < (/, < b and 0 < a < b for all i. 
Subtracting, we obtain 

k 

0 = Y.(dj ~ c i)b' ■ 

i = 0 

Eliminate any zero coefficients, and transpose all negative coefficients dj — Ci , if 
any, to obtain an equation in which all coefficients are positive and in which the 
index sets / and J are disjoint: 

L = (di — Ci)b' = ( c j ~ dj)b* = R. 

i in I j in J 


Let p be the largest index in / and let q be the largest index in J. Since / 
and J are disjoint, we may assume that q < p. As the left side L involves b p 
with a nonzero coefficient, we have L > b 1 ’: but Eq. (6) shows that the right 
side R < b q+i < b p , a contradiction. Therefore, the /?-adic digits are uniquely 
determined. • 


Example 1.45. 

Let us follow the steps in the proof of Proposition 1.44 to write 12345 in base 7. 
First write the powers of 7 until 12345 is exceeded: 7; 7 2 = 49; 7 3 = 343; 
7 4 = 2401; 7 5 = 16807. Repeated use of the division algorithm gives 


12345 = 5 x 7 4 + 340 
340 = 0 x 7 3 + 340 
340 = 6 x 7 2 + 46 
46 = 6 x 7 + 4 

4 = 4x1. 


and 340 < 7 4 = 2401; 

and 340 < 7 3 = 343; 

and 46 < 7 2 = 49; 

and 4 < 7; 


The 7-adic digits of 12345 are thus 50664. 

In short, the first 7-adic digit (on the left) is the quotient q (here, it is 5) after 
dividing by l k , where l k < m < l k+] . The second digit is the quotient after 
dividing the remainder m — ql k by l k ~ l . And so on; the 7-adic digits are the 
successive quotients. A 
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The most popular bases are b = 10 (giving everyday decimal digits), b = 2 
(giving binary digits, useful because a computer can interpret 1 as “on” and 0 as 
“off”), and b = 16 ( hexadecimal , also for computers), but let us see that other 
bases can also be useful. 

Example 1.46. 

Here is a problem of Bachet de M 'eziriac from 1624. A merchant had a 40-pound 
weight that broke into 4 pieces. When the pieces were weighed, it was found that 
each piece was a whole number of pounds and that the four pieces could be used 
to weigh every integral weight between 1 and 40 pounds. What were the weights 
of the pieces? 

Weighing means using a balance scale having two pans, with weights being 
put on either pan. Thus, given weights of 1 and 3 pounds, one can weigh a 
2-pound weight □ by putting 1 and □ on one pan and 3 on the other pan. 

A solution to Bachet’s problem is 1,3, 9, 27. If □ denotes a given integral 
weight, let us write the weights on one pan to the left of the semicolon and the 
weights on the other pan to the right of the semicolon. The number in bold- 
face is the weight of □. The reader should note that Proposition 1.44 gives the 
uniqueness of the weights used in the pans. 


1 

1 ; 

□ 

9 

9; □ 

2 

3 ; 

1, □ 

10 

9, 1 ; □ 

3 

3 ; 

□ 

11 

9,3; 1, □ 

4 

3, 1 ; 

□ 

12 

9, 3; □ 

5 

9; 

3, 1, □ 

13 

9, 3, 1 ; □ 

6 

9; 

3, □ 

14 

27 ; 9, 3, 1, □ 

7 

9, 1 ; 

3, □ 

15 

27 ; 9, 3, □ 

8 

9; 

1, □ 




The reader may complete this table for □ < 40. ◄ 

Example 1.47. 

Given a balance scale, the weight (as an integral number of pounds) of any person 
weighing at most 364 pounds can be found using only six lead weights. 

We begin by proving that every positive integer m can be written 

m = e^3^ + ek~ 1 3* + ■ ■ ■ + 3ei + eo , 

where e,- = — 1 , 0, or 1 . 

The idea is to modify the 3-adic expansion 

m = dfc 3^ T dk—\ 3^ T * * * T 3d\ T do. 
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where <7, = 0, 1, 2, by “carrying.” If do = 0 or 1, set eo = do and leave d\ alone. 
If do = 2, set <?o = —1, and replace d\ by d\ + 1 (we have merely substituted 
3 — 1 for 2). Now 1 < d\ + 1 < 3. If d\ + 1 = 1, set e\ = 1, and leave dj alone; 
if d\ + 1 =2, set e\ = — 1, and replace c ?2 by di + 1; if d\ + 1 = 3, define e\ = 0 
and replace d 2 by d 2 + 1. Continue in this way (the ultimate expansion of m may 
begin with either ek3 k or e/ c+ 1 3 k+ 1 ). Here is a table of the first few numbers in 
this new expansion (let us write 1 instead of —1). 


1 

1 

9 

100 

2 

ll 

10 

101 

3 

10 

11 

111 

4 

11 

12 

110 

5 

ill 

13 

111 

6 

110 

14 

lill 

7 

111 

15 

1110 

8 

101 




The reader should now understand Example 1.46. If □ weighs m pounds, 
write m = Y2 e i 3', where e, = 1, 0, or —1, and then transpose those terms 
having negative coefficients. Those weights with e, = — 1 go on the pan with □, 
while those weights with e,- = 1 go on the other pan. 

The solution to the current weighing problem involves choosing as weights 
1, 3, 9, 27, 81, and 243 pounds. One can find the weight of anyone under 365 
pounds, because 1 + 3 + 9 + 27 + 81 = 364. ◄ 


Exercises 

*1.42 Given integers a and b (possibly negative) with a 0, prove that there exist unique 
integers q and r with b = qa + r and 0 < r < |a|. 

1.43 Prove that s/2 is irrational using Proposition 1.11 instead of Euclid’s lemma. 

1.44 Let pi, p 2 , P 3 , ... be the list of the primes in ascending order: p\ = 2, p 2 = 3, 
P 3 = 5, . . . Deli ne J = P 1 P 2 • ■ ■ Pk + 1 for k > 1. Find the smallest k for which 
fk is not a prime. 

*1.45 Prove that if d and d' are nonzero integers, each of which divides the other, then 
d' = ±d. 

1.46 If f is a root of unity, prove that there is a positive integer d with £ d = 1 such that 
whenever f * = 1, then d \ k. 

1.47 Show that every positive integer m can be written as a sum of distinct powers of 2; 
show, moreover, that there is only one way in which m can so be written. 

1.48 Find the fo-adic digits of 1000 for b = 2, 3, 4, 5, and 20. 

*1.49 (i) Prove that if n is squarefree (i.e., n > 1 and n is not divisible by the 

square of any prime), then sfn is irrational. 

(ii) Prove that // 2 is irrational. 
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1.50 (i) Find the gcd d = (12327, 2409), fi nd integers s and 1 with d = 12327s + 

2409r, and put the fraction 2409/12327 in lowest terms. 

(ii) Find the gcd d = (7563, 526), and express it as a linear combination of 
7563 and 526. 

(iii) Find gcd d = (7404621, 73122) and write it as a linear combination; that 
is, h nd integers s and t with d = 7404621s + 73122f. 

*1.51 Let a and b be integers, and let sa + tb = 1 for s, t in Z. Prove that a and b are 
relatively prime. 

1.52 If d = ( a , b ), prove that a/d and b/d are relatively prime. 

*1.53 Prove that if ( r , m) = 1 = (r\ m), then (rr' , m) = 1. 

1.54 Assume that d = sa + tb is a linear combination of integers a and b. Find infi nitely 
many pairs of integers (s^, D with 

d = s^a + t^b. 

*1.55 If a and b are relatively prime and if each divides an integer n, then their product 
cib also divides n. 

*1.56 Prove, for any (possibly negative) integers a and b , that ( b , a) = (b — a, a). 

1.57 If a > 0, prove that a(b , c) = ( ab , ac). [One must assume that a > 0 lest a(b , c) 
be negative.] 

1.58 Prove that the following pseudocode implements the Euclidean algorithm. 

Input: a, b 
Output: d 
d := b\ s := a 
WHILE s > 0 DO 

rem := remainder after dividing d by s 
d := s 
s := rem 
END WHILE 

1.59 If F n denotes the nth term of the Fibonacci sequence 0, 1, 1, 2, 3, 5, 8, . . ., prove, 
for all n > 1, that F n+ \ and F n are relatively prime. 


Deli nition. A common divisor of integers ai , 02 , . . . , a n is an integer c with c \ a, 
for all i ; the largest of the common divisors, denoted by (a \ , 02 , ■ ■ ■ , a n ), is called the 
greatest common divisor. 


*1.60 (i) Show that if d is the greatest common divisor of a\, « 2 > • • • A 

d = a,, where ?,• is in Z for 1 < i < n. 

(ii) Prove that if c is a common divisor of ai, < 22 , . . . , a„, then c \ d. 

1.61 (i) Show that (a, b , c), the gcd of a, b, c, is equal to (a, (b, c)). 

(ii) Compute (120, 168, 328). 

1.62 A Pythagorean triple is a triple (a, b. c) of positive integers for which 


then 


+ b z = 


2. 
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it is called primitive if the gcd (a, b, c) = 1. 

(i) Consider a complex number z = q + ip, where q > p are positive 
integers. Prove that 

(q 2 - p 2 , 2qp, q 2 + p 2 ) 

is a Pythagorean triple by showing that \z 2 \ = |z| . [One can prove that 
every primitive Pythagorean triple (a, b, c) is of this type.] 

(ii) Show that the Pythagorean triple (9, 12, 15) (which is not primitive) is 
not of the type given in part (i). 

(iii) Using a calculator which can fi nd square roots but which can display only 
8 digits, prove that 


(19597501, 28397460, 34503301) 
is a Pythagorean triple by fi nding q and p. 


1.4 The Fundamental Theorem of Arithmetic 

We have already seen, in Theorem 1 .2, that every integer a > 2 is either a prime 
or a product of primes. We are now going to generalize Proposition 1.11 by 
showing that the primes in such a factorization and the number of times each of 
them occurs are uniquely determined by a. 


Theorem 1.48 (Fundamental Theorem of Arithmetic). Every in teger a >2 
is a prime or a product of primes. Moreover, if a has factorizations 

a = pi ■■ ■ p m and a = q\ • • • q n , 

where the p ’s and q ’s are primes, then n = m and the q ’s may be reindexed so 
that q, = pi for alii. 

Proof. We prove the theorem by induction on i, the larger of m and n. 

Base step. If I = 1, then the given equation is a = p\ = q\ , and the result 
is obvious. 

Inductive step. The equation gives p m \ q\ ■ ■ ■ q n . By Theorem 1.35, Eu- 
clid’s lemma, there is some i with p m \ q,. But q,. being a prime, has no positive 
divisors other than 1 and itself, so that q ,• = p, n . Reindexing, we may assume 
that q n = p m . Canceling, we have p\ ■ ■ ■ p m -\ = q\ ■ ■ ■ q n - i- By the inductive 
hypothesis, n — 1 = m — 1 and the q’s may be reindexed so that q, = p, for all i. 
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Corollary 1.49. If a > 2 is an integer, then there are distinct primes pi, unique 
up to indexing, and unique integers e,- > 0 with 


Proof. Just collect like terms in a prime factorization. • 

The uniqueness in the Fundamental Theorem of Arithmetic says that the 
exponents e \, . . . , e n in the prime factorization a = p f • • • //" are well-defined 
integers determined by a. 

It is sometimes convenient to allow factorizations pf • • • //" having some 
zero exponents, for this device allows us to use the same primes when factoring 
two given numbers. For example, 168 = 2 3 3 1 7 1 and 60 = 2 2 3 1 5 1 may be 
rewritten as 168 = 2 3 3 1 5°7 1 and 60 = 2 2 3 1 5 1 7°. 


Corollary 1.50. Every positive rational number r 1 has a unique factoriza- 
tion 

r = P S \--- Pn 

where the pj are distinct primes and the gj are nonzero integers. Moreover, r is 
an integer if and only if gj > 0 for all i. 

Proof. There are positive integers a and b with r = a/b. If a = p f ■ ■ ■ p e " 
and b = p j 1 ■ ■ ■ p j" . then r = pf 1 ■ ■ ■ pf‘ , where g, = e, — f (we may assume 
that the same primes appear in both factorizations by allowing zero exponents). 
The desired factorization is obtained if one deletes those factors pf , if any, with 
gi = o. 

Suppose there were another such factorization 


h i 

r = P j 


(by allowing zero exponents, we may again assume that the same primes occur 
in each factorization). Suppose that gj f hj for some j ; reindexing if necessary, 
we may assume that j = 1 and that g i > h\. Therefore, 


8l-ln 82 . . . gn _ h 2 

Pi P 2 Pn ~ P 2 


This is an equation of rational numbers, for some of the exponents may be nega- 
tive. Cross-multiplying gives an equation in Z whose left side involves the prime 
pi and whose right side does not; this contradicts the fundamental theorem of 
arithmetic. 

If all the exponents in the factorization of r are positive, then r is an integer 
because it is a product of integers. Conversely, if r is an integer, then it has a 
prime factorization in which all exponents are positive. • 
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Lemma 1.51. Let positive integers a and b have prime factorizations 

a = If' ■ ■ ■ P e : and b = pf ■ ■ ■ pjf , 

where ei, f, > 0 for all i. Then a \ b if and only ifet < fj for all i. 

Proof. If e, < f for all i, then b = ac, where c = pf e> ■ ■ ■ pi" : ’ n . The 
number c is an integer because fj — e,- > 0 for all i. Therefore, a \ b. 

Conversely, if b = ac, let the prime factorization of c be c = pf 1 ■ ■ ■ if" , 
where gi > o for all i . It follows from the Fundamental Theorem of Arithmetic 
that ej + gi = f) for all i , and so /,• — e, = gi > 0 for all i . • 


Definition. A common multiple of a. b is an integer m with a \ m and b \ m. 
The least common multiple, denoted by lcm(o, b) (or, more briefly, by [a, b]), 
is the smallest positive common multiple if all a, b / 0, and it is 0 otherwise. 

More generally, if n > 2, a common multiple of a\, a^, ■ ■ ■ , a n is an integer 
m with o, | m for all i. The least common multiple, denoted by [a\ , ao, ■ ■ ■ , a n ], 
is the smallest positive common multiple if all a ,• f 0, and it is 0 otherwise. 

We can now give a new description of gcd’s. 


Proposition 1.52. Let a = p\ l ■ ■ ■ //" and letb = p{ 1 ■ ■ ■ pjf, where c/, /,• > 0 
for all i ; define 

mi = mit\{ei, ft } and Mi = max{e,, /,}. 


Then 


gcd( a, b) = p"‘ l ■ ■ ■ p™" and lcm(o, b) = pf 1 ■ ■ ■ p^ n . 

Proof Define d = p "‘ 1 • • • //”" . Lemma 1.51 shows that d is a (positive) com- 
mon divisor of a and b: moreover, if c is any (positive) common divisor, then 
c = pf 1 ■ ■ ■ pf", where 0 < gi < min {ej, /,} = m, for all i. Therefore, c \ d. 

A similar argument shows that D = pf 11 ■ ■ ■ pff" is a common multiple that 
divides every other such. • 

For small numbers a and b, using their prime factorizations is a more effi- 
cient way to compute their gcd than using the Euclidean algorithm. For example, 
since 168 = 2 3 3 1 5°7 1 and 60 = 2 2 3 1 5 1 7°, we have (168, 60) = 2 2 3 1 5°7° = 12 
and [168, 60] = 2 3 3 1 5 1 7 1 = 840. As we mentioned when we introduced the 
Euclidean algorithm, finding the prime factorization of a large integer is very 
inefficient. 
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Proposition 1.53. If a and b are positive integers, then 

lcm (a, b) gcd(a, b) = ab. 

Proof. The result follows from Proposition 1.52 if one uses the identity 

nij + Mj =e t + fi, 
where m, = min je, . /, } and M\ = rnaxjc, . /,•}. • 

Of course, this proposition allows us to compute the lcm as ab/(a , b). 


Exercises 


1.63 (i) Find gcd(210, 48) using factorizations into primes. 

(ii) Find gcd(1234, 5678). 

*1.64 (i) Prove that an integer m > 2 is a perfect square if and only if each of its 

prime factors occurs an even number of times. 

(ii) Prove that if m is a positive integer for which fm is rational, then m is 
a perfect square. Conclude that if m is not a perfect square, then fm is 
irrational. 

1.65 If a and b are positive integers with (a, b) = 1, and if ab is a square, prove that 
both a and b are squares. 

*1.66 Let n = p r m, where p is a prime not dividing an integer in > 1. Prove that 

w(;). 


1.67 Defi nition. If p is a prime, defi ne the p-adic norm of a rational number a as 
follows: HOHp = 0; if a f= 0, then a = p e p e f ■ ■ ■ p e n n , where p, p\, . . . , p n are 
distinct primes, and we set ||a|| p = p~ e . 

(i) For all rationals a and b , prove that 

\\ab\\ p = IHIpll^ll/; and || a + b\\ p < max{||«|| /7 , ||£|| ; ,}. 

(ii) Defi ne Sp(a, b) = || a — b\\ p . 

(i) For all rationals a, b , prove S p (a, b) > 0 and 8 p (a, b) = 0 if and 
only if a = b; 

(ii) For all rationals a, b, prove that 8 p (a, b) = 8 p (b, a); 

(iii) For all rationals a, b, c, prove 8 p (a, b) < 8 p (a, c) + S p (c, b). 

(iii) If a and b are integers and p n \ ( a — b), then 8 p {a, b) < p~ n . (Thus, a 
and b are ‘Close" if a — b is divisible by a Targe" power of n.) 

1.68 Let a and b be in Z. Prove that if 8 p {a, b) < p~ n , then a and b have the same fi rst 
n p-adic digits, da, , d n ^\. 

1.69 Prove that an integer M > 0 is the lcm of a\, a 2 , . . . , a n if and only if it is a 
common multiple of a \ , a 2 , . . . , a n which divides every other common multiple. 

*1.70 (i) Give another proof of Proposition 1.53, [a, b](a, b) = \ab\, without using 

the Fundamental Theorem of Arithmetic. 

(ii) Find [1371, 123]. 
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1.5 Congruences 

When first learning long division, one emphasizes the quotient q; the remainder 
r is merely the fragment left over. There is now going to be a shift in viewpoint: 
we are interested in whether or not a given number b is a multiple of a number 
a , but we are not so interested in which multiple it may be. Hence, from now on, 
we will emphasize the remainder. 

Two integers a and b are said to have the same parity if they are both even or 
both odd. If a and b have the same parity, then a — bis, even: this is surely true if 
a and b are both even; if a and b are both odd, then a = 2m + 1, b = 2n + 1 , and 
a —b = 2 (in — n) is even. Conversely, if a — b is even, then we cannot have one 
of them even and the other odd lest a — b be odd. The next definition generalizes 
this notion of parity, letting any positive integer m play the role of 2. 

Definition. If m > 0 is fixed, then integers a and b are congruent modulo m. 
denoted by 

a = b mod m , 


if m | (a — b). 

Usually, one assumes that the modulus m > 2 because the cases m = 0 and 
m = 1 are not very interesting: if a and b are integers, then a = b mod 0 if and 
only if 0 | (a — b), that is, a = b, and so congruence mod 0 is ordinary equality. 
The congruence a = b mod 1 is true for every pair of integers a and b because 
1 | (a — b) always. Hence, every two integers are congruent mod 1. 

The word “modulo” is usually abbreviated to “mod.” The Latin root of this 
word means a standard of measure. Thus, the term modular unit is used today 
in architecture: a fixed length m is chosen, say, m = 1 foot, and plans are drawn 
so that the dimensions of every window, door, wall, etc., are integral multiples 
of m . 

If a and b are positive integers, then a = b mod 10 if and only if they have 
the same last digit; more generally, a = b mod 10" if and only if they have same 
last n digits. For example, 526 = 1926 mod 100. 

London time is 6 hours later than Chicago time. What time is it in London if 
it is 10:00 A.M. in Chicago? Since clocks are set up with 12 hour cycles, this is 
really a problem about congruence mod 12. To solve it, note that 

10 + 6 = 16 = 4 mod 12, 
and so it is 4:00 P.M. in London. 

The next theorem shows that congruence mod m behaves very much like 
equality. 
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Proposition 1.54. Ifm > 0 is a fixed integer, then for all in tegers a, b, c, 

(i) a = a mod m; 

(ii) if a = b mod m, then b = a mod in ; 

(iii) if a = b mod m and b = c mod m, then a = c mod m. 

Remark, (i) says that congruence is reflexive, (ii) says it is symmetric, and (iii) 
says it is transitive. ◄ 

Proof. 

(i) Since m \ (a — a) = 0, we have a = a mod m. 

(ii) If m \ ( a — b). then m \ —{a — b) = b — a and so h = a mod m. 

(iii) If m | (a — b) and m \ (b — c ), then m | [(a — b) + (b — c)] = a — c, and so 

a = c mod m . • 

We now generalize the observation that a = 0 mod m if and only if m \ a. 

Proposition 1.55. Let m > 0 be a fixed integer. 

(i) If a = qm + r, then a = r mod m. 

(ii) IfO < r < r < m, then r and r' are not congruent mod m; in symbols, 
r ^r' mod m. 

(iii) a = b mod m if and only if a and b leave the same remainder after dividing 
by m. 

Proof. 

(i) The equation a — r = qm (from the division algorithm a = qm + r) shows 
that m | (a — r). 

(ii) If r = r' mod m, then m \ (r — r') and m < r — r' . But r — r' < r < m, 
a contradiction. 

(iii) If a = qm + r and b = q'm + r ' , where 0 <r< m and 0 <r'< m. then 
a — b = {q — q')m + (r — r')\ that is. 


a — b = r — r' mod m. 


Therefore, if a = b mod m, then a — b = 0 mod m, hence r — r' = 0 mod m, 
and r =r' mod m; by (ii), r = r' . 

Conversely, if r = r' , then a = qm + r and b = q'm + r, so that a — b = 
( q ' — q)m and a = b mod m. • 

Corollary 1.56. Given m > 2, every integer a is congruent mod m to exactly 
one of 0, 1 , . . . , m — 1 . 
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Proof. The division algorithm says that a = r mod m, where 0 < r < m; 
that is, r is an integer on the list 0, 1,. . . , m — 1. If a were congruent to two 
integers on the list, say, r and r 1 , then r = r' mod m, contradicting part (ii) of 
Proposition 1.55. Therefore, a is congruent to a unique such r. • 

We know that every integer a is either even or odd; that is, a has the form 
2k or 1 + 2k. We now see that if m > 2, then every integer a has exactly one of 
the forms km = 0 + km, 1 + km, 2 + km, . . . , (m — 1) + km; thus, congruence 
mod m generalizes the even/odd dichotomy from m = 2 to m > 2. Notice how 
we continue to focus on the remainder in the division algorithm and not upon the 
quotient. 

Congruence is compatible with addition and multiplication. 

Proposition 1.57. Let m > 0 be a fixed integer. 

(i) If a, = a'j mod m for i = 1,2,.... n, then 

a\ + • • • + a n = a\ + • • • + a' n mod m . 

In particular, if a = a ' mod m and b = b' mod m, then 
a + b = a' + b' mod m . 

(ii) If at = mod m for i = 1,2,.... n, then 

a\ ■ ■ ■ a n = a[ ■ ■ ■ a' n mod m . 

In particular, if a = a' mod m and b = b' mod m, then 

ab = a'b ' mod m. 

(iii) If a = b mod in, then a n = b n mod mfor all n > 1. 

Proof. 

(i) The proof is by induction on n > 2. For the base step, if m \ ( a — a') and 
m \ ( b — b'), then m \ (a — a' + b — b ') = (a + b) — (o' + b 1 ). Therefore, 
a + b = a’ + b' mod m. The proof of the inductive step is routine. 

(ii) The proof is by induction on n > 2. For the base step, we must show that if 
m | (a — a') and m \ (b — /;')• then m \ (ab — a'b'), and this follows from the 
identity 

ab — a'b' = (ab — a'b ) + (a'b — a'b') 

= (a — a')b + a' (b — b'). 

Therefore, ab = a'b' mod m. The proof of the inductive step is routine. 

(iii) This is the special case of (ii) when a, = a and a'- = b for all i. • 
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Let us repeat a warning given on page 36. A number and its negative usually 
have different remainders after being divided by a number m. For example, 
60 = 7 • 8 + 4 and —60 = 7 • (—9) + 3. In terms of congruences, 

60 = 4 mod 7 while — 60 = 3 mod 7. 

In light of Proposition 1.55(i), if the remainder after dividing b by m is r and the 
remainder after dividing — b by m is s, then b = r mod m and —b = s mod m. 
Therefore, Proposition 1.57(i) gives 


r+s = b— b = 0 mod m . 


Thus, r + s = m, for 0 < r, s < m. For example, we have just seen that the 
remainders after dividing 60 and —60 by 7 are 4 and 3, respectively. If both a and 
—a have the same remainder r after dividing by m, then —r=r mod nr, that is, 
2r = 0 mod m . Exercise 1.77 on page 71 asks you to solve this last congruence. 

The next example shows how one can use congruences. In each case, the 
key idea is to solve a problem by replacing numbers by their remainders. 

Example 1.58. 

(i) Prove that if a is in Z, then a 2 = 0, 1, or 4 mod 8. 

If a is an integer, then a = r mod 8, where 0 < r < 7; moreover, by 
Proposition 1.57(iii), a 2 = r 2 mod 8, and so it suffices to look at the squares of 
the remainders. 


r 

0 

1 

2 

3 

4 

5 

6 

7 

r 2 

0 

1 

4 

9 

16 

25 

36 

49 

r 2 mod 8 

0 

1 

4 

1 

0 

1 

4 

1 


Table 1.1. Squares mod 8 


We see in Table 1 . 1 that only 0, 1 , or 4 can be a remainder after dividing a perfect 
square by 8. 

(ii) Prove that n = 1003456789 is not a perfect square. 

Since 1000 = 8 • 125, we have 1000 = 0 mod 8, and so 

1003456789 = 1003456 • 1000 + 789 = 789 mod 8. 

Dividing 789 by 8 leaves remainder 5; that is, n = 5 mod 8. Were n a perfect 
square, then n = 0, 1, or 4 mod 8. 
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(iii) If m and n are positive integers, are there any perfect squares of the form 
3" l + 3" + l? 

Again, let us look at remainders mod 8. Now 3 2 = 9=1 mod 8, and so we 
can evaluate 3" 1 mod 8 as follows: if m = 2k, then 3"' = 3 2k = 9 k = 1 mod 8; 
if m = 2k+l, then 3 m = 3 2k+l = 9 k - 3 = 3 mod 8. Thus, 

y n I 1 mod 8 if in is even; 

I 3 mod 8 if m is odd. 

Replacing numbers by their remainders after dividing by 8, we have the follow- 
ing possibilities for the remainder of 3 m + 3" + I . depending on the parities of 
m and n : 

3 + 1 + 1 = 5 mod 8 
3 + 3 + 1 = 7 mod 8 
1 + 1 + 1 = 3 mod 8 
1 + 3 + 1 = 5 mod 8. 

In no case is the remainder 0, 1, or 4, and so no number of the form 3 m + 3” + 1 
can be a perfect square, by part (i). M 

Every positive integer is congruent to either 0, 1 , or 2 mod 3; hence, if p 3 
is a prime, then either p = 1 mod 3 or p = 2 mod 3. For example, 7, 13, and 19 
are congruent to 1 mod 3, while 2, 5, 11, and 17 s 2 mod 3. The next theorem 
is another illustration of the fact that a proof of one theorem may be adapted to 
prove another theorem. 

Proposition 1.59. There are infinitely many primes p with p = 2 mod 3. 

Remark. This proposition is a special case of a beautiful theorem of Dirichlet 
about primes in arithmetic progressions: If a, b in N are relatively prime, then 
there are infinitely many primes of the form a + bn. In this proposition, we show 
that there are infinitely many primes of the form 2 + 3 n. Even though the proof 
of this special case is not difficult, the proof of Dirichlet’s theorem uses complex 
analysis and it is deep. A 

Proof. We mimic Euclid’s proof that there are infinitely many primes. Suppose, 
on the contrary, that there are only finitely many primes congruent to 2 mod 3; 
let them be p \ , . . . , p s . Consider the number 

m = 1 + p\ ■ ■ ■ Ps- 

Now pi = 2 mod 3 implies pj = 4=1 mod 3, and som = l + l= 2 mod 3. 
Since m > pi for all i, the number in is not prime, for it is not one of the p\. 
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Actually, none of the p\ divide m: if we define Q, = p 2 ■ ■ ■ pf_ ] p i pj + , • • • p 2 , 
then the uniqueness part of the division algorithm coupled with the equation 
m = piQi + 1 shows that m leaves remainder 1 after dividing by p;. Hence, 
the prime factorization of m is m = q\ ■ ■ ■ q t , where, for each j, either q j = 3 
or qj = 1 mod 3. Thus, m = q\ ■ ■ ■ q t = 0 mod 3 or m = q\ ■ ■ ■ q t = 1 mod 3, 
contradicting m = 2 mod 3. • 

The next result shows how congruence can simplify complicated expres- 
sions. 


Proposition 1.60. If p is a prime and a and b are integers, then 
(i a + b) p = a p + b p mod p. 

Proof. The binomial theorem gives 



But Proposition 1.36 gives = 0 mod p for 0 < r < p, and so Proposi- 
tion 1.57 gives (a + b) p = a p + b p mod p. • 

Theorem 1.61 (Fermat). 

(i) If p is a prime, then 

a p = a mod p 

for every a in Z. 

(ii) If p is a prime, then 

a p = a mod p 

for every a in Z and every integer k > 1. 

Proof 

(i) Assume first that a > 0; we proceed by induction on a. The base step a = 0 
is plainly true. For the inductive step, observe that 

(a + \) p = a p + 1 mod p, 

by Proposition 1.60. The inductive hypothesis gives a p = a mod p, and so 
(a + l) p = a p + 1 = a + 1 mod p, as desired. 

Now consider —a, where a > 0. If p = 2, then —a = a\ hence, (—a) 2 = 
a 2 = a = —a mod 2. If p is an odd prime, then (—a) p = (— 1 ) p a p = 
(— 1 ) p a = —a mod p, as desired. 

(ii) A straightforward induction on k > 1; the base step is part (i). • 
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Corollary 1.62. A positive integer a is divisible by 3 if and only if the sum of 
its ( decimal ) digits is divisible by 3. 

Proof If the decimal form of a is d/ ; . . . d\do, then 

a = dp 10^ + * * * + d\ 10 + do. 

Now 10 = 1 mod 3, so that Proposition 1.57(iii) gives 10' = 1' = 1 mod 3 for 
all i ; thus Proposition 1.57(i) gives a = dp H — • + d\ + do mod m. Therefore, a 
is divisible by 3 if and only if a = 0 mod 3 if and only if dk + • • • + d\ + do = 
0 mod 3. • 

Remark. Since 10=1 mod 9, the same result holds if we replace 3 by 9 (it is 
often called casting out 9’s ): a positive integer a is divisible by 9 if and only if 
the sum of its (decimal) digits is divisible by 9. ◄ 

Corollary 1.63. Let p be a prime and let n be a positive integer. If m > 0 and 
if E is the sum of the p-adic digits ofm, then 

= /; z mod p. 

Proof. Let m = dpp k + • • • + d\p + do be the expression of m in base p. 
By Fermat’s theorem, Theorem 1 .6 1 (ii), n p ' = n mod p for all i ; thus, n diP ' = 
(n d ‘) p ' = n di mod p. Therefore, 

n rn _ n d k p k +-+d l p+d 0 

= n dkpk n dk - ipk ~ l ■ ■ ■ n dlP n d ° 

= n dk n dk ~ l ■ ■ ■ n dl n d ° mod p 
= n dk+ - +d l+do mod p 

= mod p. • 

Example 1.64. 

What is the remainder after dividing 3 12345 by 7? By Example 1.45, the 7-adic 
digits of 12345 are 50664. Therefore, 3 12345 = 3 21 mod 7 (because 5 + 0 + 6 + 
6 + 4 = 21). The 7-adic digits of 21 are 30 (because 21 = 3x7), and so 3 21 = 
3 3 mod 7 (because 3 + 0 = 3). We conclude that 3 12345 = 3 3 = 27 = 6 mod 7. 

◄ 


Theorem 1.65. If {a, m ) = 1, then, for every integer b, the congruence 

ax = b mod m 

can be solved for x; in fact, x = sb, where sa = I mod m. Moreover, any two 
solutions are congruent mod m. 
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Remark. We consider the case (a, m) / 1 in Exercise 1.82 on page 71. -4 

Proof. Since ( a , m) = 1, there is an integer s with as = I mod m (because 
there is a linear combination 1 = sa + tm). It follows that b = sab + tmb and 
asb = b mod m, so that x = sb is a solution. (Note that Proposition 1.55(i) 
allows us to take s with 1 < s < in.) 

If y is another solution, then ax = ay mod in, and so m a(x — y). Since 
{a, in) = 1, Corollary 1.37 gives m \ (x — y); that is, x = y mod m. • 


Corollary 1.66. If p is prime, the congruence ax = b mod p is always solvable 
if a is not divisible by p. 

Proof. Since p is a prime, p j a implies (a, p) = 1. • 

Example 1.67. 

When (a, m) = 1, Theorem 1.65 says that the solutions to ax = b mod m are 
precisely those integers of the form sb + km for k ini, where sa = I mod m; 
that is, where sa + tm = 1. Thus, .v can always be found by the Euclidean 
algorithm. However, when m is small, it is easier to find such an integer .v by 
trying each of ra = 2a, 3a , . . . , (in — I )a in turn, at each step checking whether 
ra = 1 mod m. 

For example, let us find all the solutions to 

2x = 9 mod 13. 

Considering the products 2 • 2, 3 • 2, 4 • 2, . . . mod 13 quickly leads to 7 x 2 = 
14=1 mod 13; that is, s = 7 and x=7-9 = 63= 11 mod 13. Therefore, 

x = 1 1 mod 13. 


Thus, the solutions are . . . , — 15, —2, 11, 24 ◄ 


Example 1.68. 

Find all the solutions to 5 lx = 10 mod 94. 

Since 94 is large, seeking an integer s with 5 l.v = 1 mod 94, as in Exam- 
ple 1.67, can be tedious. The Euclidean algorithm gives 1 = —35 • 51 + 19 • 94, 
and so s = —35. Therefore, the solutions consist of all those numbers congruent 
to (—35) x 10 mod 94; that is, numbers of the form —350 + 94k. ◄ 

There are problems solved in ancient Chinese manuscripts that involve 
simultaneous congruences with relatively prime moduli. 
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Theorem 1.69 (Chinese Remainder Theorem). If m and in' are relatively 
prime, then the two congruences 

x = b mod m 
x = b' mod in' 

have a common solution, and any two solutions are congruent mod mm' . 

Proof. Every solution of the first congruence has the form x = b + km for 
some integer k; hence, we must find k such that b + km = b' mod in'; that is, 
km = b' — b mod nr . Since (m, in') = 1, however, Theorem 1.65 applies at 
once to show that such an integer k does exist. 

If y is another common solution, then both m and in' divide x — y; by Exer- 
cise 1.55 on page 52, mm' \ (x — y), and so x = y mod mm'. • 

Example 1.70. 

Find all the solutions to the simultaneous congruences 

x = 1 mod 8 
x = 11 mod 15. 

Every solution to the first congruence has the form 

x = 1 + 8 k, 

for some integer k. Substituting, x = 1 + 8k = 11 mod 15, so that 

8k = 4 mod 15. 

But 2-8=16=1 mod 15, so that multiplying by 2 gives 

16 k = k = 8 mod 15. 

We conclude that x=7 + 8- 8= 71isa solution, and the Chinese Remainder 
Theorem says that every solution has the form 71 + 1 20 n for n in Z. ◄ 

Example 1.71. 

Solve the simultaneous congruences 

x = 2 mod 5 
3x = 5 mod 13. 

Every solution to the first congruence has the form x = 5k + 2 for k in Z. 
Substituting into the second congruence, we have 


3(5 k + 2) = 5 mod 13. 
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Therefore, 


15k + 6 = 5 mod 13 

2k = — 1 mod 13. 

Now 7x2sl mod 13, and so multiplying by 7 gives 

k = —7 s 6 mod 13. 

By the Chinese Remainder Theorem, all the simultaneous solutions x have the 
form 

x = 5k + 2 = 5- 6 + 2 = 32 mod 65; 
that is, the solutions are 

...,-98,-33,32,97.162,.... < 


Example 1.72 (A Mayan Calendar). 

A congruence arises whenever there is cyclic behavior. For example, suppose we 
choose some particular Sunday as time zero and enumerate all the days according 
to the time elapsed since then. Every date now corresponds to some integer 
(which is negative if it occurred before time zero), and, given two dates t\ and 
t 2 , we ask for the number x = t 2 — t\ of days from one to the other. If, for 
example, t\ falls on a Thursday and tj falls on a Tuesday, then t\ = A mod 7 and 
t 2 = 2 mod 7, and so x = t 2 — h = —2 = 5 mod 7. Thus, x = Ik + 5 for 
some k. 

About 2500 years ago, the Maya of Central America and Mexico developed 
three calendars (each having a different use). Their religious calendar, called 
tzolkin, consisted of 20 “months,” each having 13 days (so that the tzolkin “year” 
had 260 days). The months were 


1. Imix 
2.1k 

3. Akbal 

4. Kan 

5. Chicchan 


6. Cimi 

7. Manik 

8. Lamat 

9. Muluc 

10. Oc 


11. Chuen 

12. Eb 

13. Ben 

14. Ix 

15. Men 


16. Cib 

17. Caban 

18. Etznab 

19. Cauac 

20. Ahau 


Let us describe a tzolkin date by an ordered pair [m, d\. where 1 < m < 20 
and 1 < d < 13 (thus, m denotes the month and d denotes the day). Instead 
of enumerating as we do (so that Imix 1 is followed by Imix 2, then by Imix 3, 
and so forth), the Maya let both month and day cycle simultaneously; that is, the 
days proceed as follows: 

Imix 1, Ik 2, Akbal 3,. . . , Ben 13, Ix 1, Men 2,..., 

Cauac 6, Ahau 7, Imix 8, Ik 9, 
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We now ask how many days have elapsed between Oc 11 and Etznab 5. 
More generally, let us find the number x of days that have elapsed from tzolkin 
{m, d) to tzolkin { m ' . d'}. As we remarked at the beginning of this example, the 
cyclic behavior of the days gives the congruence 

x = d' — d mod 13 

(e.g., there are 13 days between Imix 1 and lx 1; here, x = 0 mod 13), while the 
cyclic behavior of the months gives the congruence 

x = m' — m mod 20 

(e.g., there are 20 days between Imix 1 and Imix 8; here, x = 0 mod 20). To 
answer the original question, Oc 1 1 corresponds to the ordered pair {10. 11} and 
Etznab 5 corresponds to {18,5}. The simultaneous congruences are thus 

x = — 6 mod 13 
x = 8 mod 20. 

Since (13, 20) = 1, we can solve this system as in the proof of the Chinese 
Remainder Theorem. The first congruence gives 

x = 13k — 6, 


and the second gives 


13k — 6 = 8 mod 20; 


that is, 


13k = 14 mod 20. 

Since 13 x 17 = 221 = 1 mod 20, 20 we have k = 17 x 14 mod 20, that is, 

k = 18 mod 20, 

and so the Chinese Remainder Theorem gives 

x = 13k — 6 = 13 x 18 — 6 = 228 mod 260. 


It is not clear whether Oc 1 1 precedes Etznab 5 in a given year (one must look). 
If it does, then there are 228 days between them; otherwise, there are 32 = 
260 — 228 days between them. M 

“°One fi nds 17 either by trying each number between 1 and 19 or by using the Euclidean 
algorithm. 
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Example 1.73 (Public Key Cryptography). 

In a war between A and B, spies for A learn of a surprise attack being planned 
by B, and so they must send an urgent message back home. If B learns that its 
plans are known to A, it will, of course, change them, and so A’s spies put the 
message in code before sending it. 

It is no problem to convert a message in English into a number. Make a list 
of the 52 English letters (lower case and upper case) together with a space and 
the 1 1 punctuation marks 


In all, there are 64 symbols. Assign a two-digit number to each symbol. For 
example, 


a i-> 01, . . . , z 26, A 27 Z !=► 52 

space i-> 53, . i-> 54, , i-> 55, . . . , ( i-> 63, ) i-> 64. 

A cipher is a code in which distinct letters in the original message are replaced 
by distinct symbols. It is not difficult to decode any cipher; indeed, many news- 
papers print daily cryptograms to entertain their readers. In the cipher we have 
just described, “I love you.” is encoded 

I love you. = 3553121522055325152154. 

Notice that each coded message in this cipher has an even number of digits, and 
so decoding, converting the number into English, is a simple matter. Thus, 

3553121522055325152154= (35) (53) (12)(15) (22) (05) (53) (25)(15) (21) (54) 

= I love you. 

What makes a good code? If a message is a natural number x (and this is 
no loss in generality), we need a way to encode x (in a fairly routine way so as 
to avoid introducing any errors into the coded message), and we need a (fairly 
routine) method for the recipient to decode the message. Of utmost importance 
is security: an unauthorized reader of the (coded) message should not be able 
to decode it. An ingenious way to find a code with these properties, now called 
RSA public key cryptography, was found in 1978 by R. Rivest, A. Shamir, and 
L. Adleman; they received the 2002 Turing Award for their discovery. 

Given natural numbers N, s, and t, suppose that x st = x mod N for every 
natural number x. We can encode any natural number x < N as [x v |y, the 
remainder of x s mod N , and we can decode this if we know the number t, for 


{x s y = x st = x mod N. 
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It remains to find numbers N, s, and t satisfying the several criteria for a good 
code. 

I : Ease of Encoding and Decoding. 

Suppose that N has d (decimal) digits. It is enough to show how to encode 
a number x with at most d digits, for we can subdivide a longer number into 
blocks each having at most d digits. An efficient computation of x s mod N is 
based on the fact that computing x 2 mod N is an easy task for a computer. Since 
computing x 2 ' is just computing i squares, this, too, is an easy task. Now write 
the exponent s in base 2, so that computing x s is the same as multiplying several 
squares. If m = 2‘ + 2 ; + • • • + 2 Z , then x m = x 2 ' +2J ^ h2 ~ = x~ x 2 ' 1 ■ ■ ■ x 2 " . 

In short, computers can encode a message in this way with no difficulty. 

Decoding involves computing (x s ) f mod N, and this is also an easy task 
(assuming t is known) if, as above, we write t in base 2. 

II : Constructing N and m = st. 

Choose distinct primes p and q, both congruent to 2 mod 3, and define N = 
pq. If m > p, then 

x m = x m ~ p x p = x m ~ p x = x m - (p ~ l) mod p. 

by Fermat’s theorem. If m — (p — 1) > p, we may repeat this, continuing until 
we have 

x m-(p- 1) _ x m-(p-l)-p x p 
= x m ~ {p ~ l) ~ p x 

_ x m-2(p- 1) 


= x m-h(p- 1) mo( j p 

where h is the largest integer for which m — h(p — 1) > 0. But this is just the 
division algorithm: m = h ( p — 1) + r, where r is the remainder after dividing 
m by p — 1. Hence, for all x, 

x m = x r mod p. 

Therefore, if m = 1 mod ( p — 1), then 

x m = x mod p for all x. 

Similarly, if m = 1 mod ( q — 1), then x m = x mod (q — 1) for all x. Therefore, 
if m is chosen such that 


m = 1 mod ( p — l)(q — 1), 
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then x m = x mod p and x m = x mod q\ that is, p \ (x m — x) and q \ (x m — x). 
As p and q are distinct primes, they are relatively prime, and so pq \ (x m — x), 
by Exercise 1.55 on page 52. Since N = pq, we have shown that if m = 1 
mod ( p — I )(q — 1), then 


x m = x mod N for all x . 

It remains to find such a number m and a factorization m = st. We claim that 
there is a factorization with s = 3. Let us first show that (3, ( p — I ){q — 1)) = 
1. Since p = 2 mod 3 and q = 2 mod 3, we have p — 1 = 1 mod 3 and 
q — 1 = 1 mod 3; hence, (p — l)(q — 1) = 1 mod 3, so that 3 and ( p — \){q — 1) 
are relatively prime [Proposition 1.31]. Thus, there are integers t and u with 
1 =3 1 + (p — 1 )(q — 1 )u, so that 3t = 1 mod (p — l)(q — 1). To sum up, 
x 3f = x mod N for all x with this choice of t. Choosing m = 3 1 completes the 
construction of the ingredients of the code. 

Ill : Security. 

Since 3t = 1 mod (p — \)(q — 1), he who knows the factorization N = pq 
knows the number ( p — \ ){q — 1), and hence he can find t using the Euclidean 
algorithm. Unauthorized readers may know N, but without knowing its factor- 
ization, they do not know t and, hence, they cannot decode. This is why this 
code is secure today. For example, if both p and q have about 200 digits (and, 
for technical reasons, they are not too close together), then the fastest existing 
computers need two or three months to factor N . By Proposition 1.59, there are 
plenty of primes congruent to 2 mod 3, and so we may choose a different pair of 
primes p and q every month, say, thereby stymying the enemy. A 


Exercises 

1.71 Find all the integers x which are solutions to each of the following congruences: 

(i) 3x = 2 mod 5. 

(ii) lx = 4 mod 10. 

(iii) 243x + 17 = 101 mod 725. 

(iv) 4x + 3 = 4 mod 5. 

(v) 6x + 3 = 4 mod 10. 

(vi) 6x + 3 = 1 mod 10. 

1.72 Let m be a positive integer, and let m' be an integer obtained from m by rearranging 
its (decimal) digits (e.g., take m = 314159 and m' = 539114). Prove that m — m' 
is a multiple of 9. 

1.73 Prove that a positive integer n is divisible by 1 1 if and only if the alternating sum of 
its digits is divisible by 1 1 (if the digits of a arc d/, . . . (hdfdo, then their alternating 
sum is do — d\ + d 2 — ■ ■ ■ ). 
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1.74 What is the remainder after dividing 10 lon by 7? (The huge number lO 100 is called 
a googol 21 in children’s stories.) 

*1.75 (i) Prove that 10g + r is divisible by 7 if and only if q — 2r is divisible by 7. 

(ii) Given an integer a with decimal digits dkdk-i ■ ■ ■ do, deli ne 

a = dkdk—i ■ ■ ■ d\ — 2dQ. 


Show that a is divisible by 7 if and only if some one of a', a", a"',. . . is 
divisible by 7. (For example, if a = 65464, then a ’ = 6546 — 8 = 6538, 
a" = 653 — 16 = 637, and a’" = 63 — 14 = 49; we conclude that 65464 
is divisible by 7.) 

*1.76 (i) Show that 1000 = — 1 mod 7. 

(ii) Show that if a = ro + 1000n + 1000 2 r2 H , then a is divisible by 7 if 

and only if r o — r\ + ri — ■ ■ ■ is divisible by 7. 


Remark. Exercises 1 .75 and 1.76 combine to give an efh cient way to determine whether 
large numbers are divisible by 7. If a = 33456789123987, for example, then a = 
0 mod 7 if and only if 987 — 123 + 789 — 456 + 33 = 1230 = 0 mod 7. By Exercise 1.75 
on page 71, 1230 = 123 = 6 mod 7, and so a is not divisible by 7. -4 

*1.77 For a given positive integer m, find all integers r with 0 < r < m such that 
2 r = 0 mod m. 

1.78 Prove that there are no integers x, y, and z such that 

Jt 2 + v 2 + z 2 = 999. 

1.79 Prove that there is no perfect square a 2 whose last two digits are 35. 

1.80 If x is an odd number not divisible by 3, prove that x 2 = 1 mod 24. 

*1.81 Prove that if p is a prime and if a 2 = 1 mod p, then a = ±1 mod p. 

*1.82 Consider the congruence ax = b mod m when gcd(a, m) = d. Show that 
ax = b mod m has a solution if and only if d \ b. 

1.83 Solve the congruence x 2 = 1 mod 21. 

1.84 Solve the simultaneous congruences: 

(i) x = 2 mod 5 and 3x = 1 mod 8; 

(ii) 3x = 2 mod 5 and 2x = 1 mod 3. 

1.85 How many days are there between Akbal 13 and Muluc 8 in the Mayan tzolkin 
calendar? 

1.86 (i) Show that ( a + b) n = a n + b n mod 2 for all a and b and for all n > 1. 

(ii) Show that (a + b ) 2 ^ a 1 + b 2 mod 3. 

1.87 On a desert island, fi ve men and a monkey gather coconuts all day, then sleep. The 
fi rst man awakens and decides to take his share. He divides the coconuts into fi ve 
equal shares, with one coconut left over. He gives the extra one to the monkey, 
hides his share, and goes to sleep. Later, the second man awakens and takes his 

21 This word was invented by a 9-year-old boy when his uncle asked him to think up a name 
for the number 1 followed by a hundred zeros. At the same time, the boy suggested googolplex 
for a 1 followed by a googol zeros. 
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fi fth from the remaining pile; he too fi nds one extra and gives it to the monkey. 
Each of the remaining three men does likewise in turn. Find the minimum number 
of coconuts originally present. 


1.6 Dates and Days 

Congruences can be used to determine on which day of the week a given date 
falls. For example, on what day of the week was July 4, 1776? 

A year is the amount of time it takes the Earth to make one complete orbit 
around the Sun; a day is the amount of time it takes the Earth to make a complete 
rotation about the axis through its north and south poles. There is no reason 
why the number of days in a year should be an integer, and it is not; a year is 
approximately 365.2422 days long. In 46 B.C., Julius Caesar (and his scientific 
advisors) compensated for this by creating the Julian calendar , containing a leap 
year every 4 years; that is, every fourth year has an extra day, namely, February 
29, and so it contains 366 days (a common year is a year that is not a leap year). 
This would be tine if the year were exactly 365.25 days long, but it has the effect 
of making the year 365.25 — 365.2422 = .0078 days (about 11 minutes and 14 
seconds) too long. After 128 years, a full day was added to the calendar; that is, 
the Julian calendar overcounted the number of days. In the year 1582, the vernal 
equinox (the Spring day on which there are exactly 12 hours of daylight and 12 
hours of night) occurred on March 1 1 instead of on March 21. Pope Gregory XIII 
(and his scientific advisors) then installed the Gregorian calendar by erasing 10 
days that year; the day after October 4, 1582 was October 15, 1582, and this 
caused confusion and fear among the people. The Gregorian calendar modified 
the Julian calendar as follows. Call a year y ending in 00 a century year. If 
a year y is not a century year, then it is a leap year if it is divisible by 4; if y 
is a century year, it is a leap year only if it is divisible by 400. For example, 
1900 is not a leap year, but 2000 is a leap year. The Gregorian calendar is the 
one in common use today, but it was not uniformly adopted throughout Europe. 
For example, the British did not accept it until 1752, when 1 1 days were erased, 
and the Russians did not accept it until 1918, when 13 days were erased (thus, 
the Russians called their 1917 revolution the October Revolution, even though it 
occurred in November of the Gregorian calendar). 

The true number of days in 400 years is about 

400 x 365.2422 = 146096.88 days. 

In this period, the Julian calendar has 


400 x 365 + 100 = 146, 100 days, 
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while the Gregorian calendar has 146,097 days (it eliminated 3 leap years from 
this time period). Thus, the Julian calendar gains about 3.12 days every 400 
years, while the Gregorian calendar gains only 0.12 days (about 2 hours and 53 
minutes). 

A little arithmetic shows that there are 1628 years from 46 B.C. to 1582. 
The Julian calendar overcounts one day every 128 years, and so it overcounted 
13 days in this period (for 13 x 128 = 1662). Why didn’t Gregory have to erase 
13 days? The Council of Nicaea, meeting in the year 325, defined Easter as the 
first Sunday strictly after the Paschal full moon, which is the first full moon on 
or after the vernal equinox. The vernal equinox in 325 fell on March 21, and the 
Synod of Whitby, in 664, officially defined the vernal equinox to be March 21. 
The discrepancy observed in 1582 was thus the result of only 1257 = 1582 — 325 
years of the Julian calendar: approximately 10 days. 

Let us now seek a calendar formula. For easier calculation, we choose 0000 
as our reference year, even though there was no year zero ! Assign a number to 
each day of the week, according to the following scheme: 

Sun Mon Tues Wed Thurs Fri Sat 

0 1 2 3 4 5 6 

In particular, March 1, 0000, has some number a, where 0 < a < 6. In the next 
year 0001, March 1 has number a + 1 (mod 7), for 365 days have elapsed from 
March 1, 0000, to March 1, 0001, and 

365 = 52 x 7 + 1 = 1 mod 7. 

Similarly, March 1, 0002, has number a + 2, and March 1, 0003, has number 
a + 3. However, March 1, 0004, has number a + 5, for February 29, 0004, fell 
between March 1, 0003, and March 1, 0004, and so 366 = 2 mod 7 days had 
elapsed since the previous March 1 . We see, therefore, that every common year 
adds 1 to the previous number for March 1, while each leap year adds 2. Thus, 
if March 1 , 0000, has number a, then the number a' of March 1 , year y, is 

a' = a + y + L mod 7, 

where L is the number of leap years from year 0000 to year y. To compute L, 
count all those years divisible by 4, then throw away all the century years, and 
then put back those century years that are leap years. Thus, 

L= L.V/4J - L.V/100J + L.V/400J, 
where [xj denotes the greatest integer in x. Therefore, we have 
ct f = ci T y T L 

= a+y+ L.V/4J - L V/100J + [_.v/400J mod 7. 
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We can actually find a' by looking at a calendar. Since March 1, 1994, fell on a 
Tuesday, 


2 = a + 1994+ L1994/4J - |_1994/100J + [1994/400J 
= a + 1994 + 498 — 19+4 mod 7, 

and so 

a = —2475 = —4 = 3 mod 7 

(that is, March 1, year 0000, fell on Wednesday). One can now determine the 
day of the week on which March 1 will fall in any year y > 0, namely, the day 
corresponding to 


3 + y + L V/4J - L V/100J + L.V/400J mod 7. 

There is a reason we have been discussing March 1 . Had Julius Caesar de- 
creed that the extra day of a leap year be December 32 instead of February 29, 
life would have been simpler. Let us now analyze February 28. For example, 
suppose that February 28, 1600, has number b. As 1600 is a leap year, February 
29, 1600, occurs between February 28, 1600, and February 28, 1601; hence, 366 
days have elapsed between these two February 28 ’s, so that February 28, 1601, 
has number b + 2. February 28, 1602, has number b + 3, February 28, 1603, has 
number b + 4, February 28, 1604, has number b + 5, but February 28, 1605, has 
number b + 7 (for there was a February 29 in 1604). 

Let us compare the pattern of behavior of February 28, 1600, namely, b, 
b + 2, b + 3, b + 4, b + 5, b + 7, . . . , with that of some date in 1599. If May 
26, 1599, has number c, then May 26, 1600, has number c + 2, for February 
29, 1600, comes between these two May 26’s, and so there are 366 = 2 mod 7 
intervening days. The numbers of the next few May 26’s, beginning with May 
26, 1601, are c + 3, c + 4, c + 5, c + 7. We see that the pattern of the days 
for February 28, starting in 1600, is exactly the same as the pattern of the days 
for May 26, starting in 1599; indeed, the same is true for any date in January or 
February. Thus, the pattern of the days for any date in January or February of a 
year y is the same as the pattern for a date occurring in the preceding year y — 1 : 

— Actually, March 1 was the fi rst day of the year in the old Roman calendar. This explains 
why the leap day was added onto February and not onto some other month. It also explains 
why months 9, 10, 11, and 12, namely, September, October, November, and December, are so 
named; originally, they were months 7, 8, 9, and 10. 

George Washington's birthday, in the Gregorian calendar, is February 22, 1732. But the 
Gregorian calendar was not introduced in the British colonies until 1752. Thus, his original 
birthday was February 11. But New Year’s Day was also changed from March 1 to January 1, 
so that February, which had been in 1731, was regarded, after the calendar change, as being 
in 1732. George Washington used to joke that not only did his birthday change, but so did his 
birth year. 
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a year preceding a leap year adds 2 to the number for such a date, whereas all 
other years add 1. Therefore, we revert to the ancient calendar by making New 
Year’s Day fall on March 1 ; any date in January or February is treated as if it had 
occurred in the previous year. 

How do we find the day corresponding to a date other than March 1 ? Since 
March 1, 0000, has number 3 (as we have seen above), April 1, 0000, has number 
6, for March has 31 days and 3 + 31 =6 mod 7. Since April has 30 days. May 
1, 0000, has number 6 + 30 = 1 mod 7. Here is the table giving the number of 
the first day of each month in year 0000: 


March 1 , 0000, has number 3 
April 1 6 

May 1 1 

June 1 4 

July 1 6 

August 1 2 

September 1 5 

October 1 0 

November 1 3 

December 1 5 

January 1 1 

February 1 4 


Remember that we are pretending that March is month 1, April is month 2, etc. 
Let us denote these numbers by 1 + j (m), where j ( m ), for m = 1,2,..., 12, is 
defined by 

j (in) : 2,5, 0,3,5, 1, 4, 6, 2, 4, 0, 3. 

It follows that month m, day 1, year y, has number 

1 + j (in) + g(y) mod 7, 


where 

s(y) = V + L.V/4J - L.V/100J + LV/400J. 

Proposition 1.74 (Calendar 23 Formula). The date with month in, day d, 

year y has number 

d + j ( m ) + g(y) mod 7, 

where 

j (m) = 2, 5, 0, 3, 5, 1, 4, 6, 2, 4, 0, 3, 


23 The word calendar comes from the Greek ‘to call,” which evolved into the Latin for the 
fi rst day of a month (when accounts were due). 
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( March corresponds to m = 1, April to m = 2,. , February to m = 12) and 

g(y) = y + L y/4J - L y/iooj + Ly/400j , 

provided that dates in January and February are treated as having occurred in 
the previous year. 

Proof. The number mod 7 corresponding to month m, day 1, year y, is 
1 + j(m) + g(y). It follows that 2 + j (in) + g(y) corresponds to month in, 
day 2, year y, and, more generally, that d + j (in) + g(y) corresponds to month 
m, day d, year y. • 


Example 1.75. 

Let us use the calendar formula to find the day of the week on which July 4, 
1776, fell. Here m = 5, d = 4, and y = 1776. Substituting in the formula, we 
obtain the number 

4 + 5 + 1776 + 444 - 17 + 4 = 2216 = 4 mod 7; 
therefore, July 4, 1776, fell on a Thursday. A 

Most of us need paper and pencil (or a calculator) to use the calendar formula 
in the theorem. Here are some ways to simplify the formula so that one can do 
the calculation in one’s head and amaze one’s friends. 

One mnemonic for j ( m ) is given by 

j(m ) = [2.6m — 0.2J, where 1 < m < 12. 

Another mnemonic for j (in) is the sentence: 

My Uncle Charles has eaten a cold supper; he eats nothing hot. 

2 5 (7 = 0) 3 514 6 2 4 (7 = 0) 3 


Corollary 1.76. The date with month m, day d, year y = 100C + N, where 
0 < N < 99, has number 

d + j (in) + N + [_A74J + |C/4J - 2C mod 7, 


provided that dates in January and February are treated as having occurred in 
the previous year. 
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Proof. If we write a year y = 100C + N, where 0 < N < 99, then 
y = 100C + N = 2C + N mod 7, 

[ y/ 4J = 25 C + L2V/4J =4 C + LN/4J mod 7, 


|__y/100J = C, and |_y/400J = |C/4J. 

Therefore, 

y + L y/4J - L y/iooj + Ly/400j = n + 5C + L^v/4J + |C/4j mod 7 

= IV + L7V/4J + LC/4J - 2C mod 7. • 

This formula is simpler than the first one. For example, the number corre- 
sponding to July 4, 1776, is now obtained as 

4 + 5 + 76+ 19 + 4- 34 = 74 = 4 mod 7, 

agreeing with our previous calculation in Example 1.75. The reader may now 
discover the day of his or her birth. 

Example 1.77. 

The birthday of Amalia, the grandmother of Danny and Ella, is December 5, 
1906; on what day of the week was she born? 

If A is the number of the day, then 

A = 5 + 4 + 6+ L6/4J + |_19/4J - 38 
= — 18 mod 7 
= 3 mod 7. 

Amalia was born on a Wednesday. < 

Does every year y contain a Friday 13? We have 

5=13 + 7 (in) + g(y) mod 7. 

The question is answered positively if the numbers j (in), as m varies from 1 
through 12, give all the remainders 0 through 6 mod 7. And this is what happens. 
The sequence of remainders mod 7 is 

2,5,00000004,0,3. 

Indeed, we see that there must be a Friday 13 occurring between May and 
November. No number occurs three times on the list, but it is possible that there 
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are three Friday 13’s in a year because January and February are viewed as hav- 
ing occurred in the previous year; for example, there were three Friday 13s in 
1987 (see Exercise 1.91 on page 79). Of course, we may replace Friday by any 
other day of the week, and we may replace 13 by any number between 1 and 28. 

J. H. Conway has found an even simpler calendar formula. In his system, he 
calls doomsday of a year that day of the week on which the last day of February 
occurs. For example, doomsday 1900, corresponding to February 28, 1900 (1900 
is not a leap year), is Wednesday = 3, while doomsday 2000, corresponding to 
February 29, 2000, is Tuesday = 2, as we know from Corollary 1.76. 

Knowing the doomsday of a century year 100C, one can find the doomsday 
of any other year y = 100C + N in that century, as follows. Since 100C is 
a century year, the number of leap years from 100C to y does not involve the 
Gregorian alteration. Thus, if D is doomsday 100C (of course, 0 < D < 6), 
then doomsday 100C + N is congruent to 

D + N + L7V/4J mod 7. 

For example, since doomsday 1900 is Wednesday = 3, we see that doomsday 
1994 is Monday = 1 , for 

3 + 94 + 23 = 120 = 1 mod 7. 

Proposition 1.78 {Conway). Let D be doomsday 100C, and let 0 < N < 99. 
If N = 12 q + r, where 0 < r < 12, then the formula for doomsday 100C + N is 

D + q + r + Lr/4J mod 7. 


Proof 


Doomsday (100C + N) = D + N + [N/4j 

= D + 12 q + r + [(12^r + r ) /4 J 
= D+ \5q+r + Lr/4J 
= D + q + r + [r/ 4J mod 7. • 

For example, 94 = 12x7+ 10, so that doomsday 1994 is 3 + 7+ 10 + 2 = 
1 mod 7; that is, doomsday 1994 is Monday, as we saw above. 

Once one knows doomsday of a particular year, one can use various tricks 
(e.g., my Uncle Charles) to pass from doomsday to any other day in the year. 
Conway observes that some other dates falling on the same day of the week as 
the doomsday are 

April 4, June 6, August 8, October 10, December 12, 

May 9, July 11, September 5, and November 7; 
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it is easier to remember these dates using the notation 

4/4, 6/6, 8/8, 10/10, 12/12, and 5/9, 7/11,9/5, and 11/7, 

where m/d denotes month/day (we now return to the usual counting having Jan- 
uary as the first month: 1 = January). Since doomsday corresponds to the last 
day of February, we are now within a few weeks of any date in the calendar, and 
we can easily interpolate to find the desired day. 


Exercises 

1.88 A suspect said that he had spent the Easter holiday April 21, 1893, with his ailing 
mother; Sherlock Holmes challenged his veracity at once. How could the great 
detective have been so certain? 

1.89 How many times in 1900 did the fi rst day of the month fall on a Tuesday? 

1.90 On what day of the week did February 29, 1896, fall? Conclude from your method 
of solution that no extra fuss is needed to fi nd leap days. 

*1.91 (i) Show that 1987 had three Friday 13’s. 

(ii) Show, for any year y > 0, that g(y) — g(y — 1) = 1 or 2, where g(y) = 

y + Ly/4J - Lv/iooj + L >/400j . 

(iii) Can there be a year with exactly one Friday 13? 
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Group theory was invented by E. Galois (1811-1832) in order to solve one 
of the premiere mathematical problems of his day: when can the roots of a poly- 
nomial be found by some generalization of the quadratic formula? Since Galois 
(who was killed in a duel when he was only 20 years old), group theory has found 
many other applications. For example, we shall give a new proof of Fermat’s the- 
orem (if p is prime, then a p = a mod p), and this proof will then be adapted 
to prove a theorem of Euler: if m > 2, then a (j " } = 1 mod m, where </>(/«) is 
the Euler 0-function. We will also use groups to solve counting problems such 
as: How many different bracelets having 10 beads can be assembled from a pile 
containing 10 red beads, 10 white beads, and 10 blue beads? In Chapter 6, we 
will illustrate the fact that groups are a precise way to describe symmetry by 
classifying all possible friezes. 


2.1 Some Set Theory 

We are going to study algebraic systems called groups, which involve objects 
that can be “multiplied” and rings, which involve objects that can be multiplied 
and added. There are interesting examples of these systems whose elements 
are functions, but, more importantly, certain functions (called homomorphisms ) 
arise in comparing such systems. This section contains definitions and basic 
properties of functions, but the reader may skim this section now and return to it 
later when necessary. 

A set A is a collection of elements (numbers, points, herring, etc.); one writes 

x e X 

to denote x belonging to X. The terms set, element, and belongs to are undefined 
terms (there have to be such in any language), and they are used so that a set is 
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determined by the elements in it. 1 Thus, we define two sets X and Y to be equal , 
denoted by 

X = Y, 

if they are comprised of exactly the same elements: for every element x, we have 
x e X if and only if x e Y. 

A subset of a set X is a set S each of whose elements also belongs to X : if 
s e S, then s e X. One denotes S being a subset of X by 

S Cl; 

synonyms for this are S is contained in X and S is included in X. Note that 
X c X is always true; we say that a subset S of X is a proper subset of X, 
denoted by S C X, if S C X and S 7^ X. It follows from the definitions that two 
sets X and Y are equal if and only if each is a subset of the other: 

X = Y if and only if X C Y and Y C X. 

Because of this remark, many proofs showing that two sets are equal break into 
two parts, each half showing that one of the sets is a subset of the other. For 
example, let 


X = [a e M : a > 0} and Y = [r 2 : r e M}. 

If a e X, then a > 0 and a = r 2 , where r = s fa\ hence, a e Y and X c Y . 

For the reverse inclusion, choose r 2 e Y . If r > 0, then r 2 > 0; if r < 0, then 
r = —s, where s > 0, and r 2 = (— l ) 2 ^ 2 = s 2 > 0. In either case, r 2 > 0 and 
r 2 e X. Therefore, FcX, and so X = Y. 

Definition. The empty set is the set 0 having no elements. 

We claim, for every set X, that 0 c X. The negation of “If s e 0, then 

s e X” is “There exists s e 0 with s £ X;” as there is no s e 0, however, 

this cannot be true. It follows that there is a unique empty set, for if 0 1 were a 
second such, then 0 C 0 | and, similarly, 0 ]C 0 . Therefore, 0 = 0 1 
Here are some ways to create new sets from old. 

Definition. If X and Y are subsets of a set Z, then their intersection is the set 
Xnf = (;eZ:zeX and z e T}. 

'There are some rules governing the usage of e; for example, x e a e x is always a false 
statement. 
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More generally, if {/l, : i e 1} is any, possibly infinite, family of subsets of a set 
Z, then their intersection is 

P'1 Aj = {z € Z : z € Ai for all i e /}. 

iel 

It is clear that X n Y c X and X n Y C Y . In fact, the intersection is 
the largest such subset: if S C X and S C 7, then S C XHf. Similarly, 

Die/ A < ^ A ; for an 7 e I. 


Definition. If X and Y are subsets of a set Z, then their union is the set 
X U Y = {z e Z : z e X or z e Y}. 

More generally, if {A,- : i e 1} is any, possibly infinite, family of subsets of a set 
Z, then their union is 

Ai = {z e Z \ z £ Ai for some i e /}. 

iel 

It is clear that X c X U Y and Y c X U Y. In fact, the union is the smallest 
such subset: if X c S and Y c 5. then XUf C S. Similarly, Ay C U/e/ A ' 
for all j € I. 

Definition. If X and Y are sets, then their difference is the set 

X-Y = {xeX:xi Y). 

The difference Y — X has a similar definition and, of course, Y — X and X — Y 
need not be equal. 

In particular, if X is a subset of a set Z, then its complement in Z is the set 
X' = Z - X = {z e Z : z i X). 
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It is clear that X' is disjoint from X\ that is, there is no element z e Z lying 
in both X and X', so that X n X' = 0. (Thus, the empty set 0 is needed to 
guarantee that the intersection of two subsets A and B always be a subset; that 
is, A n B should always be defined.) In fact, X' is the largest subset of Z disjoint 
from X: if S C Z and S D X = 0, then S C X' . 

Functions 

The idea of a function occurs in calculus (and earlier); examples are x, sinx, 
*Jx, 1/x, x + 1, e x , etc. Calculus books define a function fix) as a “rule” that 
assigns, to each numbers, exactly one number, namely, f (a). Thus, the squaring 
function assigns the number 81 to the number 9; the square root function assigns 
the number 3 to the number 9. Notice that there are two candidates for y/9, 
namely, 3 and —3. In order that there be exactly one number assigned to 9, one 
must select one of the two possible values ±3; everyone has agreed that <Jx > 0 
whenever x > 0, and so this agreement implies that *Jx is a function. 

The calculus definition of function is certainly in the right spirit, but it has a 
defect: what is a rule? To ask this question another way, when are two rules the 
same? For example, consider the functions 

fix) = (x + l) 2 and g(x) = x 2 + 2x + 1 . 

Is f(x ) = g(x)? The evaluation procedures are certainly different: for example, 
/( 6) = (6 + 1) 2 = 7 2 , while g(6) = 6 2 + 2- 6+1 = 36+12+1. Since the term 
rule has not been defined, it is ambiguous, and our question cannot be answered. 
Surely the calculus description is inadequate if one cannot decide whether these 
two functions are equal. 

To find a reasonable definition, let us return to examples of what we seek 
to define. Each of the functions x 2 , sinx, etc., has a graph, namely, the subset 
of the plane consisting of all those points of the form (a, fia)). For example. 
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the graph of f (x) = x 2 is the parabola consisting of all the points of the form 
(a, a 2 ). 

A graph is a concrete thing, and the upcoming formal definition of a function 
amounts to saying that a function is its graph. The informal calculus definition 
of a function as a rule remains, but we will have avoided the problem of saying 
what a rule is. In order to give the definition, we first need an analog of the plane 
(for we will want to use functions f (x ) whose argument x does not vary over 
numbers). 

Definition. If A and Y are (not necessarily distinct) sets, then their cartesian 2 
product X x Y is the set of all ordered pairs (x, y), where x e X and y e Y . 

The plane isRxS. 

The only thing one needs to know about ordered pairs is that 
(x, y) = (x\ y') if and only if x = x and y = y' 

(see Exercise 2.4 on page 101). 

Observe that if X and Y are finite sets, say, A = m and T = n (we denote 
the number of elements in a finite set A by | A|), then | A x Y\ = mn. 

Definition. Let A and Y be (not necessarily distinct) sets. A function f from 
A to Y, denoted by 

/: A — »■ Y) 

is a subset / c A x Y such that, for each a e X, there is a unique b e Y with 
(a, b) e /. 

For each a e A, the unique element b e Y for which (a, b) e / is called the 
value of / at a, and b is denoted by f(a). Thus, / consists of all those points in 
A x Y of the form (a. f{a)). When / : JR — > JR, then / is the graph of /(x). 

Example 2.1. 

(i) The identity function on a set A, denoted by lx : A -» A, is defined by 

\x(x) = x for every x e A [when A = K, the graph of the identity 
function is the 45° line through the origin consisting of all those points in 
the plane of the form (a, a)]. 

(ii) Constant functions: If yo e Y, then /(x) = yo for all x e X (when 
A = JR = 7, then the graph of a constant function is a horizontal line). A 

“This term honors R. Descartes, one of the founders of analytic geometry. 
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From now on, we depart from the calculus notation; we denote a function by 
/ and not by /(x); the latter notation is reserved for the value of / at an element 
x (there are a few exceptions; we will continue to write the familiar functions, 
e.g., polynomials, sinx, e x , *Jx, log x, as usual). Here are some more words. 
If / : X — »■ Y. call X the domain of /, call Y the target (or codomain) of /, 
and define the image (or range ) of /, denoted by im /, to be the subset of Y 
consisting of all the values of /. When we say that X is the domain of a function 
f : X — »• Y, we mean that /(x) is defined for every x e X. For example, the 
domain of sin x is M, its target is usually M, and its image is [ — 1 . 1 ] . The domain 
of 1 /x is the set of all nonzero reals and its image is also the nonzero reals; the 
domain of the square root function is the set M- = {x e M : x > 0} of all 
nonnegative reals and its image is also K- . 

Definition. Functions f : X -> Y and g : X' — > Y' arc equal if X — X ' , 
Y = Y' , and the subsets / c X x Y and g C X' x Y' are equal. 

A function f : X — »• Y has three ingredients: its domain X, its target Y. 
and its graph, and we are saying that two functions are equal if and only if they 
have the same domains, the same targets, and the same graphs. It is plain that 
the domain and the graph are essential parts of a function, and some reasons for 
caring about the target are given in a remark at the end of this section. 

Definition. If / : X — »■ Y is a function, and if 5 is a subset of X, then the 
restriction of / to S is the function f\S: S — »• Y defined by (f\S) (s) = f(s ) 
for all s e S. 

If S is a subset of a set X, define the inclusion i : S — »■ X to he the function 
defined by i (s) = s for all s e S. 

If 5 is a proper subset of X, then the inclusion i is not the identity function 
1$ because its target is X, not 5; it is not the identity function lx because its 
domain is S, not X. If 5 is a proper subset of X, then f\S ^ f because they 
have different domains. 

Proposition 2.2. Let f : X — > Y and g : X' Y' be functions. Then f = g if 
and only if X = X' , Y = Y' , and f(a) = g(a) for every a e X. 

Remark. This proposition resolves the problem raised by the ambiguous term 
rule. If /, g : M -> M are given by /(x) = (x + l) 2 and g(x) = x 2 + 2x + 1, 
then f = g because f(a) = g(a) for every number a. ◄ 

Proof. Assume that f = g. Functions are subsets of X x Y, and so f = g 
means that each of / and g is a subset of the other (informally, we are saying 
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that / and g have the same graph). If a e X and (a. f (a)) e / = g, then 
(a, f (a)) e g. But there is only one ordered pair in g with first coordinate a, 
namely, (a, g(a)) [because the definition of function says that g gives a unique 
value to a]. Therefore, ( a , /(a)) = (a, g (a ) ) , and equality of ordered pairs gives 
f(a) = g{a), as desired. 

Conversely, assume that / (a ) = g ( a ) for every a e X. To see that / = g, 
it suffices to show that / c g and g C f. Each element of / has the form 
(o, Since f(a) = g(a), we have {a, f(a)) = (a,g(a)), and hence 

(o, f (a)) e g. Therefore, / C g. The reverse inclusion g C f is proved 
similarly. • 

Let us make the contrapositive explicit: if /, g : X — > Y are functions that 
disagree at even one point, i.e., if there is some a e X with f{a) ^ g{a), then 

f ^ §■ 

We continue to regard a function / as a rule sending x £ X to f(x) e Y, but 
the precise definition is now available whenever we need it, as in Proposition 2.2. 
However, to reinforce our wanting to regard functions / : X — > T as dynamic 
things sending points in X to points in Y, we often write 

f-x y 

instead of /(x) = y. For example, we may write / : x i->- x 2 instead of /(x) = 
x 2 , and we may describe the identity function by x \-> x for all x. 


Example 2.3. 

Our definitions allow us to treat a degenerate case. If X is a set, what are the 
functions X — > 0? Note first that an element of X x 0 is an ordered pair (x, _y) 
with x e X and y e 0; since there is no y e 0, there are no such ordered 
pairs, and so X x 0 = 0. Now a function X — > 0 is a subset of X x 0 of a 
certain type; but X x 0 = 0, so there is only one subset, namely 0, and hence 
at most one function, namely, / = 0. The definition of function X — 0 says 
that, for each x e X, there exists a unique y e 0 with (x, y) € /. If X ^ 0, 
then there exists x e X for which no such y exists (there are no elements y at 
all in 0), and so / is not a function. Thus, if X ^ 0, there are no functions 
from X to 0. On the other hand, if X = 0, we claim that / = 0 is a function. 
Otherwise, the negation of the statement “/ is a function” would be true: “there 
exists x e 0, etc.” We need not go on; since 0 has no elements in it, there is 
no way to complete the sentence so that it is a true statement. We conclude that 
/ = 0 is a function 0 — > 0, and we declare it to be the identity function 1 0 . 

◄ 


There is a name for functions whose image is equal to the whole target. 
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Definition. A function / : X — > Y is surjective (or onto) if 

im f = Y. 

Thus, / is surjective if, for each y e Y, there is some x e X (probably 
depending on y) with y = f (x ) . 

Example 2.4. 

(i) Of course, identity functions are surjections. 

(ii) The sine function R — > R is not surjective, for its image is [—1, 1] which 
is a proper subset of its target R 

(iii) The functions x 2 : R -> R and e x : R R have target R Now im.r 2 
consists of the nonnegative reals and ime x consists of the positive reals, so 
that neither x 2 nor e x is surjective. 

(iv) Let / : R — > R be defined by 

f(a) = 6a + 4. 

To see whether / is a surjection, we ask whether every be R has the form 
b = f(a ) for some a; that is, given b, can one find a so that 

6a + 4 = bl 

One can always solve this equation for a, obtaining a = \{b — 4). There- 
fore, / is a surjection. 

(v) Let / :R— {4}^-Kbe defined by 


/(«) = 


6a + 4 
2a — 3 


To see whether / is a surjection, we seek a solution a for a given b: can 
we always solve 


2a — 3 

This leads to the equation a (6 — 2b) = —3b — 4, which can be solved for 
a if 6 — 2b j=- 0 [note that {—3b — 4)/(6 — 2b) ^ 3/2]. On the other 
hand, it suggests that there is no solution when b = 3 and, indeed, there is 
not: if (6a + 4)/(2a — 3) = 3, cross multiplying gives the false equation 
6a + 4 = 6a — 9. Thus, 3 ^ im /, and / is not a surjection. ◄ 
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Instead of saying that the values of a function / are unique, one sometimes 
says that / is single-valued. For example, if M- denotes the set of nonnegative 
reals, then r : M- — »■ K- is a function because we have agreed that JTi > 0 for 
every positive number a. On the other hand, f(a ) = ± s/a is not single- valued, 
and hence it is not a function. 

The simplest way to verify whether an alleged function / is single- valued is 
to phrase uniqueness of values as an implication: 

if a = a' , then f{a) = f(a'). 

Does the formula g(f) = ab define a function g: Q — > Q? There are many 
ways to write a fraction; since j = |, we see that = 1 ■ 2 /= 3 • 6 = g j, 

and so g is not a function. Had we said that the formula g ( f ) = ab holds 
whenever f is in lowest terms, then g would be a function. 

The formula / (f) = 3f does define a function / : Q — > Q, for it is single- 
valued: if f = p, we show that 

©=3f=3 £ = /(£)■ 

Now f = y gives ab' = a'b, so that 3 ab' = 3a' b and 3f = 3f). Thus, / is a 
bona fide function. 

The following definition gives another important property a function may 
have. 


Definition. A function f : X — »• Y is injective (or one-to-one) if, whenever 
a and a' are distinct elements of X, then f{a) / f(a'). Equivalently, (the 
contrapositive states that) / is injective if, for every pair a, a' e X, we have 

f(a) = f(a') implies a = a' . 

The reader should note that being injective is the converse of being single- 
valued: / is single-valued if a = a' implies f{a) = f{a'Y, f is injective if 
f(a) = f{a') implies a = a'. 

Most functions are neither injective nor surjective. For example, the squaring 
function / : M — M, defined by fix) = a , is neither. 


Example 2.5. 


(i) Identity functions lx are injective. 
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(ii) Let / : K — {|}^-Mbe defined by 


/(«) = 


6a + 4 
2a — 3 


To check whether / is injective, suppose that f (a) = / (6): 

6a + 4 6/? + 4 

2a —3 ~ 2b -3' 


Cross multiplying yields 


12a6 +8 b- 18 a - 12 = 12a6 + 8a - 186 - 12, 

which simplifies to 26a = 26 b and hence a = b. We conclude that / is 
injective. (We saw, in Example 2.4(v), that / is not surjective.) 

(iii) Consider / : R — > R given by f(x) = x 2 — 2x — 3. If we try to check 
whether / is an injection by looking at the consequences of f(a) = fib), 
as in part (ii), we arrive at the equation a 2 — 2a = b 2 — 26; it is not instantly 
clear whether this forces a = 6. Instead, we seek the roots of f(x), which 
are 3 and —1. It follows that / is not injective, for /( 3) = 0 = /(— 1); 
thus, there are two distinct numbers having the same value. ◄ 

Sometimes there is a way of combining two functions to form another func- 
tion, their composite. 


Definition. If / : X — > Y and g : Y — > Z are functions (the target of / is the 
domain of g ), then their composite, denoted by g of, is the function X — > Z 
given by 

go f: x\-+ g(f(x))\ 

that is, first evaluate / on x and then evaluate g on fix). 

Composition is thus a two-step process: x h>- / (x ) gi f (x)). For exam- 
ple, the function h : R — > M, defined by h(x) = e C0SA , is the composite go/, 
where / (x) = cos v and g (x ) = e x . This factorization is plain as soon as one 
tries to evaluate, say, 6(7 t); one must first evaluate fin ) = cos7T = —1 and then 
evaluate g(f( jr)) = g ( — I ) = e~ x in order to evaluate h(n). The chain rule in 
calculus is a formula for computing the derivative (g o f )' in terms of g' and /': 

(g ° fY(x) = g\f(x)) ■ f\x). 

If / : N N and g : N — > M are functions, then g o f : N — > K is defined, 
but / o g is not defined [for target(g) = M N = domain(/)]. Even when 
f:X—>Y and g : Y — > X, so that both composites go f and / o g are defined, 
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these composites need not be equal. For example, define f,g: N — »■ N by 
/: n i-> n 2 and g: n \-> 3 n; then go/: 2 i-> g( 4) = 12 and / o g: 2 i-> 
/ (6) = 36. Hence, go/^/og. 

Given a set X, let 


T(X) = {all functions X — »• X}. 

The composite of two functions in JF(X) is always defined, and it is, again, a 
function in J-(X). As we have just seen, composition is not commutative', that 
is, fog and go f need not be equal. Let us now show that composition is always 
associative. 

Lemma 2.6. Composition of functions is associative : if 

f:X—?Y, g:Y^Z, and h: Z -* W 
are functions, then 

h o (go f) = (h o g)o f. 

Proof We show that the value of either composite on an element a e X is just 
w = h(g(f(a))). IfxeX, then 

ho(go f):xv+ (go f)(x) = g(f(x)) h(g(f (x))) = w, 


and 

(h og) of: x f(x) (h og)(f(x)) = h(g(f(x))) = w. 

It follows from Proposition 2.2 that the composites are equal. • 

In light of this lemma, we need not write parentheses: the notation /logo/ 
is unambiguous. 

The next result implies that the identity function 1 x behaves for composition 
in J-(X) just as the number one does for multiplication of numbers. 

Lemma 2.7. Iff : X Y, then l y o/ = / = /ol x . 

Proof. If x e X, then 


ly O /: X i-> f(x) f(x) 


and 

/ ol x :ri4rH- f(x). • 

Are there “reciprocals” in J-(X); that is, are there any functions / for which 
there is g e J-(X) with f o g = \ x and g o f = lx? 
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Definition. A function / : X — > Y is bijective (or is a one-one correspondence) 
if it is both injective and surjective. 

Example 2.8. 


(i) Identity functions are always bijections. 

(ii) Let X = {1, 2, 3} and define / : X — > X by 

/(l) = 2, /( 2) = 3, /(3) = 1. 

It is easy to see that / is a bijection. ◄ 

We can draw a picture of a function in the special case when X and Y are 
finite sets. Let X = {1, 2, 3, 4, 5}, let Y = {a, b, c, d , e }, and define / : X — > T 
by 


/(!) = &; /(2) = e; / (3) = a; /(4) = fi; /(5) = c. 



Figure 2.4 A Function 


We see that / is not injective because /( 1) = b = /( 4), and / is not 
surjective because there is noreX with /'(x) = d. Can one reverse the arrows 
to get a function g : Y XI There are two reasons why one cannot. First, there 
is no arrow going to d, and so g(d) is not defined. Second, what is g(b)l Is 
it 1 or 4? The first problem is that the domain of g is not all of Y. and it arises 
because / is not surjective; the second problem is that g is not single- valued, and 
it arises because / is not injective (this reflects the fact that being single-valued 
is the converse of being injective). Therefore, neither problem arises when / is 
a bijection. 
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Definition. A function / : X — »■ 7 has an inverse if there exists a function 
g : Y — > X with both composites go/ and fog being identity functions. 

We do not say that every function / has an inverse; on the contrary, we have 
just analyzed the reasons why most functions do not have an inverse. Notice that 
if an inverse function g does exist, then it “reverses the arrows” in Figure 2.4. 
If f{a) = y, then there is an arrow from a to y. Now go/ being the identity 
says that a = (g o f)(a) = gif (a)) = g(y); therefore g: y k* a, and so the 
picture of g is obtained from the picture of / by reversing arrows. If / twists 
something, then its inverse g untwists it. 


Lemma 2.9. Iff:X—>Y and g: Y — > X are functions such that g o f = lx. 
then f is injective and g is surjective. 

Proof. Suppose that f(a) = /(o'); apply g to obtain gif (a)) = gifia'))-, 
that is, a = a' [because gifia)) = a], and so / is injective. If x e X, then 
x = gifix)), so that x e img; hence g is surjective. • 


Lemma 2.10. A function /' : W — > Y has an inverse g: Y — > X if and only if 
it is a bijection. 

Proof. If / has an inverse g, then Lemma 2.9 shows that / is injective and 
surjective, for both composites go/ and / o g are identities. 

Assume that / is a bijection. Let y e Y. Since / is surjective, there is 
some a e X with fia) = y; since / is injective, this element a is unique. 
Defining g(y) = a thus gives a (single-valued) function whose domain is Y [g 
merely “reverses arrows:” since fia) = y. there is an arrow from a to y, and 
the reversed arrow goes from y to a]. It is plain that g is the inverse of /; that is, 
figiy )) = fia) = y for all y e Y and gifia)) = g(y) = a for all a € X. • 

Notation. The inverse of a bijection / is denoted by / -1 (Exercise 2.8 on 
page 101 says that a function cannot have two inverses). This is the same notation 
used for inverse trigonometric functions in calculus; e.g., sin~ 1 x = arcsinx 
satisfies sin(arcsin(.x)) = x and arcsin(sin(x)) = x. Of course, sin -1 does not 
denote the reciprocal 1 / sinx, which is esc x. 


Example 2.11. 

Here is an example of two functions / and g whose composite g o / is the 
identity but whose composite / o g is not the identity; thus, / and g are not 
inverse functions. 
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Define f,g: N 


N as follows: 
f(n) = n+ 1; 


g(n) = 


0 


if n = 0 


n — I if n > 1 . 


The composite g o f = 1^, for g(f(n)) = g(n + 1) = n (because n + 1 > 1). 
On the other hand, fog 1 pj because f(g( 0)) = /( 0) = 1 ^ 0. ◄ 


Example 2.12. 

If a is a real number, then multiplication by a is the function [i a : R — »• R defined 
by r m* ar for all r e R. If a f= 0. then \x a is a bijection; its inverse function is 
division by a, namely, 8 a : R — »• R defined by r -r; of course, 8 a = /r \/ a . If 
a = 0, however, then fi a = fiQ is the constant function /to : r 0 for all r e M, 
which has no inverse function because it is not a bijection. A 

Two strategies are now available to determine whether a given function is a 
bijection: use the definitions of injective and surjective, or find an inverse. For 
example, if M + denotes the positive real numbers, let us show that the exponen- 
tial function / : R — > R + , defined by f(x) = e x = X ffx n /n\, is a bijection. A 
direct proof that / is an injection would require showing that if e“ = e h , then 
a = b\ a direct proof showing that / is surjective would involve showing that 
every positive real number c has the form e" for some a. It is simplest to use the 
(natural) logarithm g(y) = logy. The usual formulas e log - v = y and log e x = x 
say that both composites fog and g o f are identities, and so / and g are inverse 
functions. Therefore, / is a bijection, for it has an inverse. 

Let us summarize the results of this subsection. 


Proposition 2.13. If the set of all the bijections from a set X to itself is denoted 
by Sx, then composition of functions satisfies the following properties: 

(i) if f, g e S x , then f o g e S x ; 

(ii) ho {go f) = (h o g) o f for all f, g, h £ S x ; 

(iii) the identity l x lies in S x , and 1 x°f = f = f° \ X for every f £ S x ; 

(iv) for every f £ S x , there is g £ S x with g o f = \ x = / o g. 

Proof. We have merely restated results of Exercise 2. 1 3(ii) on page 102, Lem- 
mas 2.6, 2.7, and 2.10. • 

Remark. Here is one interesting use of bijections. It is easy to prove (see 
Exercise 2.11 on page 101) that two finite sets X and Y have the same number 
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of elements if and only if there is a bijection / : X — »• Y. This suggests the 
following definition, due to G. Cantor (1845-1918). 


Definition. Two (possibly infinite) sets X and Y have the same number of 
elements, denoted by |X| = |T|, if there exists a bijection f:X—>Y. 

For example, a set X is called countable if either X is finite or X has the same 
number of elements as the natural numbers N. If X is infinite and countable, 
then there is a bijection / : N — > X; that is, there is a list xo, x \ , xi, ■ . with 
no repetitions, of all the elements of X, where x n = f(n ) for all n e N. Cantor 
proved that K is uncountable , that is, M is not countable. Thus, there are different 
sizes of infinity (in fact, there are infinitely many different sizes of infinity). 
The difference in size can be useful. For example, one calls a real number z 
algebraic if it is a root of some polynomial f (x) = qo + q\x + • • • + q n x n , 
all of whose coefficients qo, q\, . . . , q n are rational; one calls z transcendental 
if it is not algebraic. Of course, every rational r is algebraic, for it is a root of 
x — r. But irrational algebraic numbers do exist; for example, \fl is algebraic, 
being a root of x 2 — 2. Are there any transcendental numbers? One can prove 
that there are only countably many algebraic numbers, and so it follows from 
Cantor’s theorem, the uncountability of M, that there exist (many) transcendental 
numbers. A 

Remark. Why should we care about the target of a function when its image 
is more important? As a practical matter, when first defining a function, one 
usually does not know its image. For example, let / : M — M be defined by 

f(x) = \x\ e s] x 2 + sin 2 x. 

We must analyze / to find its image, and this is no small task. But if targets have 
to be images, then we could not even write down / : X — > Y without having first 
found the image of /. Thus, targets are convenient to use. 

Part of the definition of equality of functions is that their targets are equal; 
changing the target changes the function. Suppose we do not do this. Consider a 
function f:X—>Y that is not surjective, let Y' = im /, and define g: X Y' 
by g(x) = f{x) for all x e X. The functions / and g have the same domain 
and the same values (i.e., the same graph); they differ only in their targets. Now 
g is surjective. Had we decided that targets are not a necessary ingredient in the 
definition of a function, then we would not be able to distinguish between /, 
which is not surjective, and g, which is. It would then follow that every function 
is a surjection (this would not shake the foundations of mathematics, but it would 
force us into using cumbersome circumlocutions). A 
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If X and Y are sets, then a function f:X—>Y defines a “forward motion,” 
direct image, carrying subsets of X into subsets of Y: if S C X, then 

f(S) = {y £ Y : y = f(s ) for some s £ 5}. 

One calls f(S ) the direct image of S. A function / also defines a “backward 
motion,” inverse image, carrying subsets of Y into subsets of X: if W C Y, then 

f~\W) = {x£X: f(x)£W). 

We are not assuming that / is a bijection, and so / _1 does not mean the inverse 
function in this context (after all, we are not assuming that / is a bijection). 
Here, / -1 (W) means all those elements in X, if any, which / sends into W. 
One calls the inverse image of W. 

In Exercise 2.15 on page 102, it is shown that direct image preserves unions: 
if /: X —> Y and if {5; : i e /} is a family of subsets of X, then /( Ute/ ^i) = 
Ujg/ /(S/). On the other hand, /'(Si O S 2 ) 7 ^ /(Si) O /(S 2 ) is possible. Ex- 
ercise 2.16 on page 102 shows that inverse image is better behaved than direct 
image. 

Proposition 2.14. Let X and Y be sets, and let f : X — > Y be a function. 

(i) IfT C S are subsets of X, then f{T) C f(S), and if U C V are subsets 
ofY,thenf-\U)Qf~\V). 

(ii) If U C Y, then ff~ l (U ) C U ; if f is a surjection, then ff~ l (U) = U. 

(iii) IfS C X, then S C / _1 /(S), but strict inequality is possible. 

Proof. 

(i) If y e f{T), then y = /(f) for some t e T. But t e S, because fcS, and 
so /(f) e /(S). Therefore, /(T) C f(S). The other inclusion is proved just as 
easily. 

(ii) If a e ff~ l (U), then a = f(x') for some x' € f~ l {U ); this says that 
a = f{x’) e U. We prove the reverse inclusion when / is surjective. If u e U, 
then there is x e X with f(x) = u; hence, x £ f~ l (U), and so u = fix) £ 

ff~Hu ). 

(iii) If s £ S, then f(s) £ f(S), and so s £ f~ l f(s) C f~ l (S). 

To see that there may be strict inequality, let / : M — > S, where S 1 is the unit 
circle, be defined by f(x) = e 2nix . If A = {0}, then /(A) = {1} and 

/“7(A) = /“ 7 ({ 0 }) = f~\{ 1} = Z D A. . 

Corollary 2.15. If f: X — >■ F is a surjection, then B f~ l (B) is an injec- 
tion V(Y ) — > V(X), where V(Y) denotes the family of all the subsets ofY. 
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Proof. If B, C C Y and / 1 ( B) = f *(0), then Proposition 2.14(h) gives 

B = ff~\B) = ff~\C) = C. . 

Equivalence Relations 

We are going to define the important notion of equivalence relation, but we begin 
with the general notion of relation. 

Definition. Given sets X and Y, a relation from X to Y is a subset R of Xxf; 
if X = Y, then we say that R is a relation on X. One usually writes xRy instead 
of (x,y)eR. 

Here is a concrete example. Certainly, < should be a relation on R; to see 
that it is, define the relation 

R = {(x, _y) e ! x R : (x, _y) lies on or above the line y = x}. 

The reader should check that x < y if and only if (x, y) e R. 

Example 2.16. 

(i) Every function f: X — > Y is a relation from A 7 to Y. 

(ii) Equality is a relation on any set X. 

(iii) Congruence mod m is a relation on Z. ◄ 

Definition. A relation x = y on a set X is 

reflexive if x = x for all xeX; 
symmetric ifx = y implies y = x for all x, y e X\ 
transitive ifx = y and y = z imply x = z for all x,y,z e X. 

A relation that has all three properties- reflexivity, symmetry, and transitivity- is 
called an equivalence relation. 

Example 2.17. 

(i) Ordinary equality is an equivalence relation on any set. 

(ii) If m > 0, then Proposition 1 .54 says that x = y mod m is an equivalence 
relation on X = Z . 
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(iii) Let X = {(a, b) e Z x Z : b ^ 0}, and define a relation = on X by 
cross-multiplication: 

(i a,b) = (c,d ) if ad = be. 

We claim that = is an equivalence relation. Verification of reflexivity and 
symmetry is easy. For transitivity, assume that (a. b) = (c, d) and (c, d) = 
(e, /). Now ad = be gives adf = bef, and cf = de gives bef = bde\ 
thus, adf = bde. We may cancel the nonzero integer d to get af = be\ 
that is, (a, b) = (e, /). 

(iv) In calculus, equivalence relations are implicit in the discussion of vectors. 
An arrow from a point P to a point Q can be denoted by the ordered pair 
( P , Q); call P its foot and Q its head. An equivalence relation on arrows 
can be defined by saying that ( P , (Q) = ( P' . Q ') if these arrows have the 
same length and the same direction. More precisely, ( P , (Q) = ( P ' . Q') 
if the quadrilateral obtained by joining P to P' and Q to Q' is a paral- 
lelogram [this definition is incomplete, for one must also relate collinear 
arrows as well as “degenerate” arrows ( P , P)\. Note that direction of an 
arrow from P to Q is important; if P ^ Q, then (P, Q) ^ ( Q , P). M 

An equivalence relation on a set X yields a family of subsets of X. 

Definition. Let = be an equivalence relation on a set X. If a e X, the equiva- 
lence class of a, denoted by [a], is defined by 

[o] = {x e X : x = a] c X. 

We now display equivalence classes arising from the equivalence relations 
given above. 

Example 2.18. 

(i) Let = be equality on a set X. If a e X, then [a] = {a}, the subset having 
only one element, namely, a. After all, if x = a, then x and a are equal! 

(ii) Consider the relation of congruence mod m. on Z, and let a e Z. The 
congruence class of a is defined by 

{x eZ : x = a + km where k e Z}. 

On the other hand, the equivalence class of a is, by definition, 

{x € Z : x = a mod m}. 

Since x = a mod m if and only if x = a + km for some k e Z , these two 
subsets coincide; that is, the equivalence class [a] is the congruence class. 
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(iii) The equivalence class of (a, b ) under cross-multiplication, where a, be Z 
and b 7 ^ 0 , is 

[(a, b )] = {(c, d) : ad = be}. 

If we denote \(a, b ) | by a/b, then this equivalence class is precisely the 
fraction usually denoted by a/b. After all, it is plain that (1,2) f (2, 4), 
but [(1,2)] = [(2,4)]; that is, 1/2 = 2/4. 

(iv) An equivalence class [ ( / J , (Q ) | of arrows, as in Example 2. 17(iv), is called 
a vector ; we denote it by [( P , Q )] = PQ. -4 

It is instructive to compare rational numbers and vectors, for both are defined 
as equivalence classes. Every rational a/b has a “favorite” name - its expression 
in lowest terms; every vector has a favorite name - an arrow ( O , Q ) with its foot 
at the origin. Working with fractions in lowest terms is not always convenient; 
for example, even if both a /b and c/d are in lowest terms, their sum (ad+bc) /bd 
may not be in lowest terms. Vector addition is defined by the parallelogram law 
(see Figure 2.5): OP + OQ = OR, where O, P, Q, and R are the vertices 
of a parallelogram. But OQ = PR, because ( O , Q) = (P, R), and it is more 
natural to write OP + OQ = OP + PR = OR. 



Figure 2.5 Parallelogram Law 


Lemma 2.19. If = is an equivalence relation on a set X, then x = y if and 
only if [x] = [}’]• 

Proof. Assume that x = y. If z e [x], then z = x, and so transitivity gives 
z = y; hence [x] C [y |. By symmetry, y = x, and this gives the reverse 
inclusion [y] c [x \. Thus, [x \ = [y ] . 

Conversely, if [x \ = [y], then x e [.v], by reflexivity, and sox e [x \ = [y ] . 
Therefore, x = y. • 
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In words, this lemma says that one can replace equivalence by honest equal- 
ity at the cost of replacing elements by their equivalence classes. 

Here is a set-theoretic idea that we shall show is intimately involved with 
equivalence relations. 

Definition. A family V of nonempty subsets of a set X is called pairwise dis- 
joint if, for all A, B € V, either A = B or AC\ B = 0. 

A partition of a set X is a family of nonempty pairwise disjoint subsets, 
called blocks, whose union is all of X. 

Notice that if X is a finite set and A\, Ai, . . . , A n is a partition of X, then 
\X\ = |Ar| + |A 2 | H h |A„|. 

We are now going to prove that equivalence relations and partitions are 
merely different views of the same thing. 

Proposition 2.20. If= is an equivalence relation on a set X, then the equiva- 
lence classes form a partition of X. Conversely, given a partition V of X, there 
is an equivalence relation on X whose equivalence classes are the blocks in V- 

Proof Assume that an equivalence relation = on X is given. Each x e X lies 
in the equivalence class [x ] because = is reflexive; it follows that the equivalence 
classes are nonempty subsets whose union is X. To prove pairwise disjointness, 
assume that a e [x\ n [y], so that a = x and a = y. By symmetry, x = a, 
and so transitivity gives x = y. Therefore, [x ] = [y], by the lemma, and so the 
equivalence classes form a partition of X. 

Conversely, let V he a partition of X. If x, y e X, define x = y if there is 
A e V with x e A and y e A. It is plain that = is reflexive and symmetric. To 
see that = is transitive, assume that x = y and y = z; that is, there are A, B € V 
with x, y € A and y, zeB. Since y e ADB, pairwise disjointness gives A = B 
and so x, z e A; that is, x = z. We have shown that = is an equivalence relation. 

It remains to show that the equivalence classes are the subsets in V. If x e X, 
then x e A for some A e V . By definition of =, if y e A, then y = x 
and y e [x ]; hence, A c [x|. For the reverse inclusion, let z e [a |. so that 
z = x. There is some B with x € B and zeB; thus, x e A fi B. By pairwise 
disjointness, A = B, so that z e A, and [x \ c A. Hence, [x] = A. • 

Example 2.21. 

(i) If = is the identity relation on a set X, then the blocks are the 1 -point 
subsets of X. 
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(ii) If X = {(a, b) e Z x Z : b ^ 0}, then the equivalence class of ( a , b ) [of 
the equivalence relation given by cross-multiplication] is the fraction a/b. 
Thus, 1 /2 is the class of {a, 2a) for all nonzero a e Z. A 

Given an equivalence relation on a set X, it is a common practice to con- 
struct the set X whose elements are the equivalence classes [ x ] of elements 
x £ X. For example, if X = {( a.b ) e Z x Z : b 0}, then X = Q. 
Does the formula f{a/b) = a + b define a function / : Q — »• Z; that is, is / 
single-valued? The formula does define a relation from Q to Z, but it does not 
define a function; for example, 1/2 = 2/4, but /( 1/2) = 3^6 = /(2/4). 
Thus, / is not single-valued, and it does not define a function (older texts of- 
ten call / a multiple -valued function). The value f{a/b) depends on the choice 
of name a/b. In contrast, addition of rationals, a: Q x Q — »■ Q, given by 
{a/b) + {c/d) = {ad + bd)/bd, does define a function. Even though the for- 
mula for a appears to depend on the choices of name for a/b and c/d . it is 
actually independent of such choices. The reader can prove that if a/b = a' /b' 
and c/d = c'/d r , then {ad + bc)/bd = {a'd' + b'c')/b'd'. When the values of 
a supposed function / appear to depend on choices (for example, if the value 
/([*]) seems to depend on the choice of representative x), then one is obliged to 
prove independence of choices before declaring that / is a (single-valued) func- 
tion. Checking whether an alleged function / with domain X is single- valued is 
often described as checking that / is well-defined. 


Exercises 

2.1 If A and B are subsets of a set X, prove that A — B = A tl B', where B' = X — B 
is the complement of B . 

*2.2 Let A and B be subsets of a set X. Prove the de Morgan laws : 

(A U B / = A' fl B' and (A G B)’ = A' U B ' , 

where A' = X — A denotes the complement of A. 

*2.3 If A and B are subsets of a set X, defi ne their symmetric difference by 

A + B = (A - B) U {B - A). 

(i) Prove that A + B = (A U B) — (A fl B). 

(ii) Prove that A + A = 0 

(iii) Prove that A + 0 = A. 

(iv) Prove that A + (B + C) = (A + B) + C. 

(v) Prove that A G {B + C) = (A G B) + (A G C). 
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*2.4 Let A and B be sets, and let a e A and b e B. Deft ne their ordered pair as follows: 

(a, b) = {a, {a, b}}. 

If a' G A and b' € B , prove that (a' , b r ) = (a, b) if and only if a' = a and b' = b. 

2.5 Let A = {(y, x) : x € M}; thus, A is the line in the plane which passes through the 
origin and which makes an angle of 45° with the Y-axis. 

(i) If P = (a, b) is a point in the plane with a ^ b, prove that A is the 
perpendicular bisector of the segment P P' having endpoints P = {a, b) 
and P' = (b, a). 

(ii) If / : M — ► K is a bijection whose graph consists of certain points ( a , b) 
[of course, b = f(a )], prove that the graph of / _1 is 

{(b,a) : (a, b) e /}. 

*2.6 Let X and 7 be sets, and let / : X — ► Y be a function. 

(i) If S' is a subset of X, prove that the restriction / 1 .S’ is equal to the com- 
posite / o i, where i : S -> X is the inclusion map. 

(ii) If im / = A C 7, prove that there exists a surjection f':X—>A with 
/ = jf, where j : A —*■ Y is the inclusion. 

2.7 If / : X — > Y has an inverse g, show that g is a bijection. 

*2.8 Show that if / : X — ► 7 is a bijection. then it has exactly one inverse. 

2.9 Show that /: M — ► ffi, defined by f(x) = 3x + 5, is a bijection, and find its 
inverse. 

2.10 Determine whether /:QxQ->Q, given by 

f{a/b , c/d) = (a + c)/(b + d) 


is a function. 

*2.11 Let X = [x \ , . . . , x m } and 7 = { yi , . . . , y n } be fi nite sets, where the x are distinct 
and the yj are distinct. Show that there is a bijection f : X -» 7 if and only if 
|X| = 1 7 1 ; that is, m = n. 
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*2.12 ( Pigeonhole Principle) 

(i) If X and 7 are fi nite sets with the same number of elements, show that 
the following conditions are equivalent for a function / : X —*■ Y: 

(i) / is injective; 

(ii) / is bijective; 

(iii) / is surjective. 

(ii) Suppose there are 1 1 pigeons, each sitting is some pigeonhole. If there 
are only 10 pigeonholes, prove that there is a hole containing more than 
one pigeon. 

*2.13 Let /: X -*■ Y and g: Y -*■ Z be functions. 

(i) If both / and g are injective, prove that g o / is injective. 

(ii) If both / and g are surjective, prove that g o f is surjective. 

(iii) If both / and g are bijective, prove that g o f is bijective. 

(iv) If g o f is a bijection, prove that / is an injection and g is a surjection. 

2.14 (i) If / : (— 7r/2, 7r/2) — > M. is deli ned by a h-> tan a, then / has an inverse 

function g; indeed, g = arctan. 

(ii) Show that each of arcsinx and arccosx is an inverse function (of sinx 
and cosx, respectively) as defi ned in this section. (Domains and targets 
must be chosen with care.) 

*2.15 (i) Let / : X — ► Y be a function, and let {Sj : i e /} be a family of subsets 

of X. Prove that 

/(U s 0 = lJym 

!€/ i€l 

(ii) If Si and S 2 are subsets of a set X, and if /: X — > Y is a function, 
prove that f(S 1 G S 2 ) C /(Si) n /(S 2 ). Give an example in which 

/ (Si n s 2 ) # / (Sj) n / (s 2 ). 

(iii) If Si and S 2 are subsets of a set X, and if /: X Y is an injection, 
prove that /(Si G S 2 ) = /(Si) G /(S 2 ). 

*2.16 Let /: X — 7 be a function. 

(i) If 5, C 7 is a family of subsets of 7, prove that 

/ _i (u s ') = u/ _1 ^) and /- i (n^)=n/- i w- 

i i i i 

(ii) If fi C 7, prove that = f ~ ] ( B)\ where B’ denotes the comple- 

ment of B . 

2.17 Let / : X —*■ Y be a function. Defi ne a relation on X by x = X if /(x) = fix'). 
Prove that = is an equivalence relation. If x e X and /(x) = y, the equivalence 
class [x] is usually denoted by / -1 (y), the inverse image of {y }. 

2.18 Let X = {rock, paper, scissors}. Recall the game whose rules are: paper dominates 
rock, rock dominates scissors, and scissors dominates paper. Draw a subset of 
X x X showing that domination is a relation on X. 
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2.2 Permutations 

In high school mathematics, the words permutation and arrangement are used 
interchangeably, if the word arrangement is used at all. We draw a distinction 
between them. 

Definition. A permutation of a set X is a bijection a: X — »■ X. If X is a 
finite set and |X| = n, then an arrangement of X is a list x\, X 2 , ■ ■ ■ , x„ with no 
repetitions of all the elements of X. 

Given an arrangement x\, X 2 , ■ . . , x n , define / : {1, 2, . . . , n} —>■ X by f{i) = 
xp, thus, the list x\, X 2 , ■ . ■ ,x n displays the values of /. That there are no rep- 
etitions on the list says that / is injective, for i ^ j implies Xj = f{i) / 
f(j) = Xj', that every x e X occurs on the list says that / is surjective. Thus, 

an arrangement of X defines a bijection / : {1,2 n} — »• X. 

For example, there are six arrangements of X = { a,b,c }: 

abc\ acb\ bac\ bca\ cab\ cba. 

All we can do with such lists is count their number, and there are exactly nl 
arrangements of an /i-clemcnt set X. 

If X = {1,2,..., n), then a permutation a : X — > X gives the list a(l) = 
i'i, o'(2) = i 2 , ... , a(n) = i n . We can use a two-rowed notation to denote this 
permutation: if a(j) is the j th item on the list, then 

a =( 1 2 n 

\“(1) “(2) ••• «0‘) «(«) 

Informally, arrangements (lists) and permutations (bijections) are simply differ- 
ent ways of describing the same thing. The advantage of viewing permutations 
as bijections, rather than as lists, is that they can now be composed and, by Ex- 
ercise 2. 1 3(ii) on page 102, their composite is also a permutation. 

The results in this section first appeared in an article of Cauchy in 1815. 


Definition. The family of all the permutations of a set X, denoted by Sx, is 
called the symmetric group on X. When X = {1,2,..., «}, Sx is usually de- 
noted by S,„ and it is called the symmetric group on n letters. 

Notice that composition in S 3 is not commutative. If 


and 


1 2 3 

2 1 3 


a = 


1 2 3 

2 3 1 
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then their composites 3 are 


a o p 


1 2 3 
3 2 1 


and 


/? Off 



so that a o /3 ^ j3 o a [for example, a o /3: 1 h- = a (2) = 3 while 

P o a : li-^2i-> 1], 

On the other hand, some permutations do commute; for example. 


(\ 2 3 4\ 

V 1 3 4j 


and 


/I 2 3 4\ 
U 2 4 3j 


commute, as the reader may check. 

Composition in Sx satisfies the cancellation law: 


if y o a = y o /3, then a = /3. 


To see this, 


a = lx o a 

= (y ~ l o y) o a 
= y o (y o a) 
= o (y o p) 

= ( Y~ l o y) o p 
= l x o /3 = /3. 


A similar argument shows that 

a o y = p o y implies a = /3. 

Aside from being cumbersome, there is a major problem with the two-rowed 
notation for permutations. It hides the answers to elementary questions such 
as: is the square of a permutation the identity? what is the smallest positive 
integer m so that the ;»th power of a permutation is the identity? can one factor 
a permutation into simpler permutations? The special permutations introduced 
below will remedy this defect. 

Let us first simplify notation by writing ft a instead of/Joa and (1) instead 
of l x . 

3 There are authors who multiply permutations differently, so that their a o fl is our f) o a. 
This is a consequence of their putting ‘functions on the right:” instead of writing a(i) as we 
do, they write (i)a. Consider the composite of permutations a and f) in which we first apply 
/S and then apply a. We write i k>- fid) k>- In the right-sided notation, i (i)/3 k> 

(( i)f})a . Thus, the notational switch causes a switch in the order of multiplication. 
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Definition. If a e S n and i e [1,2, n}, then a fixes i if a(i) = i, and a 
moves i if a(i) i. 


Definition. Let i\, i 2 , . . . , i r be distinct integers in {1,2,.... « }. If a e S n 
fixes the other integers (if any) and if 

oc{i\)=h, a(i 2 ) = h, . .., a{i r -i) = i r , a(i r ) = i\, 


then a is called an r-cycle. One also says that a is a cycle of length r. 

A 2-cycle interchanges i\ and i 2 and fixes everything else; 2-cycles are also 
called transpositions. A 1-cycle is the identity, for it fixes every i ; thus, all 
1-cycles are equal: ( i ) = (1) for all i. 

Consider the permutation 


a = 


1 2 3 4 5\ 

4 3 1 5 2 ;- 


The two-rowed notation does not help us recognize that a is, in fact, a 5-cycle: 
a(l) = 4, a (4) = 5, a (5) = 2, a (2) = 3, and a (3) = 1. We now introduce new 
notation: an r-cycle a, as in the definition, shall be denoted by 


a = (i ! i 2 ... i r )- 


For example, the 5-cycle a above will be written a = (1 4 5 2 3). The reader 
may check that 


and 


Notice that 


1 2 
2 3 

1 2 3 
5 1 4 


3 4\ 

4 ^ = (1 234), 

2 3) = (1 ”4 2), 



3 4 
1 4 


= (1 2 3). 


P = 


2 3 4\ 
1 4 3j 


is not a cycle; in fact, ft = (1 2) (3 4). The term cycle comes from the Greek 
word for circle. One can picture the cycle ( i 1 i 2 ... i r ) as a clockwise rotation 
of the circle (see Figure 2.8). Any ij can be taken as the “starting point,” and so 
there are r different cycle notations for any r-cycle: 


O't h ■■■ i r ) = 0'2 h ••• ir i\) = • • • = {i r h h ■■■ ir- 1). 
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Figure 2.9 is a page from Cauchy’s 1815 paper in which he introduces the 
calculus of permutations. Notice that his notation for a cycle is a circle. 

Let us now give an algorithm to factor a permutation into a product of cycles. 
For example, take 


= (\ 2 3 4 5 6 7 8 9\ 

“ \6 4 7 2 5 1 8 9 3/' 

Begin by writing “(1.” Now a: 1 h>- 6, so write “(1 6.” Next, a: 6 i-> 1, and so 

the parentheses close: a begins “(1 6).” The first number not having appeared is 
2, and so we write “(1 6)(2.” Now a : 2 i-> 4, so we write “(1 6)(2 4.” Since 

or. 4 h- 2, the parentheses close once again, and we write “(1 6) (2 4).” The 

smallest remaining number is 3; now 3 i — >- 7, 7 i — >- 8, 8 i — >- 9, and 9 i — > 3; this 
gives the 4-cycle (3 7 8 9). Finally, a(5) = 5; we claim that 

a = (1 6) (2 4) (3 7 8 9)(5). 

Since multiplication in S n is composition of functions, our claim is that 
«(i) = [(l 6) (2 4) (3 7 8 9) (5)] (/) 

for every i between 1 and n (after all, two functions / and g are equal if and 
only if f (i ) = g(i ) for every i in their common domain). The right side is the 
composite fly 8, where /3 = (1 6), y = (2 4), and <5 = (3 7 8 9) (actually, there 
is also the 1-cycle (5), which we may ignore when we are evaluating, for (5) is 
the identity function). Now a(l) = 6; multiplication of permutations views the 
permutations as functions and then takes their composite. For example, if i = 1, 
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QU’UNE FUNCTION PEUT ACQUERIR, ETC. 


Nous observerons d’abord quo, si dans la substitution 
par deux permutations. prises a volonte dans la suite 



’79 

formee 


A|, A,, A 3 , .... A\ 


les deux termes A,, A, renferment des indices correspondants qui soient 
respectivement egaux, on pourra, sans inconvenient, supprimer les 
memos indices pour ne consorvcr que ceux des indices correspondants 
qui sont respectivement inegaux. Ainsi, par exemple, si l’on fait n = 5, 
les deux substitutions 


/ ,.2.34.5 \ 

et 

/ 1.2.3 \ 

\ 2 . 3 . i . 4 . 5 / 


V 2.3,1 / 


seront equivalentes entre elles. Je dirai qu’une substitution aura etc 
reduite a sa plus simple expression lorsqu’on aura supprime, dans les 
deux termes, tous les indices correspondants egaux. 

Soient maintenant a, (3, y, ..., £, r\ plusieurs des indices 1 , 2 , 3, .... » 


en nombre egal bp, et supposons que la substitution 
sa plus simple expression prenne la forme 


A, 


A, 


reduite a 


/a (3 y ... C -n\ 

\ (3 y a ... n a ) 

en sorte que, pour deduire le second terme du premier, il suffise de 
ranger en cercle, ou plutot en polvgono regulier, les indices a, y, 
0 , . . ., Z, iq de la maniere suivante : 

a 

n [3 

5 y 
a 


et de rCmplacer ensuite chaque indice par celui qui, le premier, vient. 
prendre sa place lorsqu’on fait tourner d’orient en Occident le polvgone 


Figure 2.9 

A. Cauchy, M'emoire sur le nombre des valeurs qu’une fonction peut 
acqu'erir lorsqu’on y permute de toutes les manieeres possibles les 
quantit'es qu’elle renferme 

J. de I’Ecole Poly., XVIF Cahier, Tome X (1815), pp. 1-28 
From: Oeuvres Completes d ’Augustin Cauchy, II Serie, Tome I, 
Gauthier- Villars, Paris, 1905. 
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then 

py8(l) = P(y(8(m 

= fiiyi 1)) <5 fixes 1 
= /3(1) y fixes 1 

= 6 . 

In Proposition 2.24, we will give a more satisfactory proof that a has been fac- 
tored as a product of cycles. 

Factorizations into cycles are very convenient for multiplication of permuta- 
tions. For example, in S 5 , let us simplify the product 

cr = (1 2)(1 3 425)(2 5 1 3) 

by displaying the “partial outputs” of the algorithm: a : I 3 h 4 h 4, so 
that a begins (1 4. Next, a : 4 h> 4 2 1; hence, a begins (1 4). The 

smallest number not yet considered is 2, and ct:2i-^5m>- 1 n> 2; thus, a 
fixes 2, and a begins (1 4) (2). The smallest number not yet considered is 3, and 
a : 3m*2i-^5m^ 5. Finally, cr : 5 1 — > 1 1 — 3 1 — > 3, and we conclude that 

= (14)(2)(3 5). 

In the factorization of a permutation into cycles, given by the algorithm 
above, one notes that the family of cycles is disjoint in the following sense. 

Definition. Two permutations a, ft e S,, are disjoint if every i moved by one 
is fixed by the other: if a '(f) f i, then ft (i) = i, and if ft(j) f j, then a(j) = j . 
A family ft\ .... ft, of permutations is disjoint if each pair of them is disjoint. 

Consider the special case of cycles. If a = (i 1 i 2 ... i r ) and ft = 

(j 1 72 • ■ • js), then any k in the intersection {/ 1, i 2 i,-} n {71, 72, • • • , j s } 

is moved by both a and /J. Thus, it is easy to see that two cycles are disjoint 
if and only if {/ 1 , i 2 , . . . , i r \ Cl {71, 72, . . . , j s ] = 0; that is, {i 1, i 2 , . . . , i r } and 
{71 , 72, • • • , js} are disjoint sets. 

When permutations a and /3 are disjoint, there are exactly three distinct pos- 
sibilities for a number i : it is moved by a, it is moved by fj, or it is moved by 
neither (that is, it is fixed by both). 

Lemma 2.22. Disjoint permutations a, ft € S n commute. 

Proof. It suffices to prove that if 1 < i < n, then aftii) = (iaii ). If ft moves 
i, say, f{i) = j i, then f also moves 7 [otherwise, f(j) = j and /3(i) = j 
contradicts f’s being an injection]; since a and ft are disjoint, a(i) = i and 
a(j) = j. Hence fad ) = j = afUi). A similar argument shows that a/3(i) = 
fa{i) if a moves i. The last possibility is that neither a nor j> moves i ; in this 
case, af(i) = i = fa(i). Therefore, a j3 = fa, by Proposition 2.2. • 
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In particular, disjoint cycles commute. 

It is possible for permutations that are not disjoint to commute; for example, 
the reader may check that (1 2 3) (4 5) and (1 3 2) (6 7) do commute. An 
even simpler example arises from a permutation commuting with its powers: 

9 ? 

aa~ = a a. 

Lemma 2.23. Let X = {1, 2, . . . , n], let a e Sx = S n , and, if i i e X, define 
i j for all j > I by induction : i j+\ = a(i j). Write Y = {/ y : j > 1}, and let Y' 
be the complemen t of Y. 

(i) If a moves i \, then there is r > 1 with i\, ... ,i r all distinct and with 

i r +l = «0V) = it- 

(ii) a(Y) = Y and a(Y r ) = Y' . 

Proof. 

(i) Since X is finite, there is a smallest r > 1 with i \, . . . , i r all distinct, but with 
i r+ 1 = a(i r ) e {f i , . . . , /, }; that is, o i(i r ) = i j for 1 < j < r. If j > 1, then 
a(i r ) = ij = ot(i j- 1). But a is an injection, so that i r = i/_i, contradicting 
i \, . . . , i r all being distinct. Therefore, a(i r ) = i\. 

(ii) It is obvious that a(Y) C Y, for if ij e Y , then ot(ij) = ij + \ e Y. If 
k e Y' . then either a(k) e Y or a(k) e Y ' , for Y' is the complement of F, and 
so X = Y U Y' . If a{k ) e F, then a(k) = i j = ct(ij~ i) for some j (by part (i), 
this is even true for ij = i i). Since a is injective, k = i j-\ e F, contradicting 
F n Y' = 0. Therefore, a(F') C Y' . 

We now show that the inclusions a{Y) C F and a(Y') C F' are actually 
equalities. Now a(X) = a(F U Y r ) = a(F) U Q!(F'), and this is a disjoint union 
because a is an injection. But a(F) C F gives |a(F)| < |F|, and a(F') C Y' 
gives |or(F')| < |F'|. If either of these inequalities is strict, then |cr(X)| < |X|. 
But a(X) = X, because a is a surjection, and this is a contradiction. • 

The argument in the proof of Lemma 2.23(i) will be used again. 

Proposition 2.24. Every permutation a e S n is either a cycle or a product of 
disjoint cycles. 

Proof. The proof is by induction on the number k > 0 of points moved by a. 
The base step k = 0 is true, for a is now the identity, which is a 1 -cycle. 

If k > 0, let i i be a point moved by a. As in Lemma 2.23, define F = 
{il , . . . , i r ), where i \ , . . . , i r are all distinct, a(ij) = ij+\ for j < r, and a(i r ) = 
i\. Let o e Sx be the r-cycle (i \ ij 13 ... i r ), so that a fixes each point, if any, 
in the complement Y' of F. If r = n, then a = a. If r < n, then a{Y') = Y', 
as in the lemma. Define a ' = aa~ l ; we claim that a' and o are disjoint. If o 
moves i, then i = ij e F. But a' (ij) = aa~ l (ij ) = a(ij- 1) = ij\ that is, a' 
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fixes ij. Suppose that a' moves k. We have just seen that k £ Y, so that we may 
assume that k e Y'\ but, by definition, a fixes every k e Y' . Therefore, a = a'cr 
is a factorization into disjoint permutations. The number of points moved by 
a' is k — r < k, and so the inductive hypothesis gives a r = j3\ ■ ■ ■ ft, , where 
y0i , . . . , p t are disjoint cycles. Therefore, a = a' a = f)\ • ■ ■ /3 t a is a product of 
disjoint cycles, as desired. • 

We have just proved that the output of the algorithm on page 106 is always a 
product of disjoint cycles. 

Usually one suppresses the 1 -cycles in this factorization [for 1 -cycles equal 
the identity (1)]. However, a factorization of a containing one 1 -cycle for each i 
fixed by a, if any, will arise several times in the sequel. 


Definition. A complete factorization of a permutation a is a factorization of a 
into disjoint cycles that contains one 1 -cycle (i) for every i fixed by a. 


The factorization algorithm always yields a complete factorization. For ex- 
ample, if 


a = 


2 3 4 5\ 

3 4 2 5/ 


then the algorithm gives a = ( 1 ) (2 3 4) (5), which is a complete factorization. 
However, if one suppresses 1 -cycles, the factorizations 


a = (2 3 4) = ( 1 ) (2 3 4) = (2 3 4) (5) 


are not complete factorizations. In a complete factorization a = ■ ■ ■ f t , every 

symbol i between 1 and n occurs in exactly one of the /Ts. 

There is a relation between an r-cycle f and its powers , where f> k denotes 
the composite of f with itself k times. We modify notation a bit for the next 
observation; write /3 = (r'o i\ ■ ■ ■ iV-t). Note that i\ = f(io), ii = f(i\) = 
pmo)) = P 2 (i 0), h = p(il) = P(P 2 (io)) = P\io), and, for all* < r - 1, 

ik = P k (io). (1) 

Since f(i r - 1 ) = io, it is easy to see that the equation i^ = f k do) holds if 
subscripts j in the notation ij are taken mod r. 


Lemma 2.25. 

(i) Let a = /38 be a factorization into disjoint permutations. If f moves i, 
then a k (i) = f k (i) for all k > 1. 

(ii) If P and y are cycles both of which move i = io, and if f k (i) = y k {i) for 
all k > 1, then ft = y. 
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Remark. The hypothesis in (ii) does not assume that the cycles /3 and y have 
the same length, but this is part of the conclusion. M 

Proof. 

(i) Since /; moves i , disjointness implies that 8 fixes i ; indeed, every power of 
S fixes i. Now f and 8 commute, by Lemma 2.22, and so Exercise 2.27(i) on 
page 121 gives (/38) k (i) = f k (8 k (i)) = f k (i), as desired. 

(ii) By Eq. (1), if f = do i i ... i r -i), then h = f k do) for all ^ < r — 1. 
Similarly, if y = (to j\ . . ■ j s -\), then jk = y k do) for k < s — 1. We may 
assume that r < s, so that i\ = j\. . . . ,i,— \ = j r -\- Since j r = y r (io) = 
fi r do) = if), it follows that s — I = r — 1 and jk = i/ ; for all k. Therefore, 
/3 = (t 0 i i ... i r _i) = y. • 

The next theorem is an analog of the fundamental theorem of arithmetic. 

Theorem 2.26. Let a e S n and let a = f>\ • • ■ ff be a complete factorization 
into disjoint cycles. This factorization is unique except for the order in which the 
cycles occur. 

Proof. Let a = y\ ■ ■ ■ y s be a second complete factorization of a into disjoint 
cycles. Since every complete factorization of a has exactly one 1 -cycle for each 
i fixed by a, it suffices to prove, by induction on £, the larger of t and .v. that the 
cycles of length > 1 are uniquely determined by a. 

The base step is true, for when i = 1, the hypothesis is = a = y\. 

To prove the inductive step, note first that if ft t moves i = /'q, then f k (io) = 
ct k (i'o) for ah k > 1, by Lemma 2.25(i). Now some yj must move to; since 
disjoint cycles commute, we may re-index so that y s moves to- As in the first 
paragraph, y k do) = a k do) for all k. It follows from Lemma 2.25(h) that 
/ % = y s , and the cancellation law on page 104 gives ■ ■ ■ f t ~i = y\ ■ ■ ■ y s -i- 
By the inductive hypothesis, s = t and the y’s can be reindexed so that y\ = 
■ ■ ■ , Yt - 1 = ft- 1- • 

Every permutation is a bijection; how do we find its inverse? In Figure 2.8, 
the pictorial representation of a cycle f as a clockwise rotation of a circle, the 
inverse is just a counterclockwise rotation. 

Proposition 2.27. 

(i) The inverse of the cycle a = dl h ■ ■ ■ d) A the cycle dr b-l ■ ■ ■ h)- 

d 1 h ■■■ ir)~ l = dr ir - 1 ■ • ■ ii)- 

(ii) Ify e S„ and y = f\ - • • fk, then 

y-'=p-d..fi-t 
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(i note that the order of the factors in y 1 has been reversed). 

Proof 

(i) If a e S„, we show that both composites are equal to (1). Now the composite 
(/' | *2 • ■ • ir)dr ir - 1 ■ • • h) hxcs each integer between 1 and n, if any, other 
than i \, . . . , i r . The composite also sends i \ i r v-? i \ while it acts on i j , for 
j > 2, by ij ij-\ i j . Thus, each integer between 1 and n is fixed by the 
composite, and so it is (1). A similar argument proves that the composite in the 
other order is also equal to (1), from which it follows that 

O'l h ■■■ h-) _1 = dr ir - 1 • • • *'l)- 

(ii) The proof is by induction on k > 2. For the base step k = 2, we have 

(Pi = PlPf 1 = ( 1 ). 

Similarly, iff' = d). 

For the inductive step, let 5 = ■ ■ ■ fik, so that - ■ ■ fkfk+\ = 8fik+i- Then 

(P\--PkPk+\T l = m+ir l 

= ?k'^' 

= Pk+ 1 iPi---Pk )~ 1 
= Pk+iPk 1 --Pi 1 - • 

Thus, (1 2 3 4) _1 = (4 3 2 1) = (1 4 3 2) and (1 2)” 1 = (2 1) = (1 2) 
(every transposition is equal to its own inverse). 

Example 2.28. 

The result in Proposition 2.27 holds, in particular, if the factors are disjoint cycles 
(in which case the reversal of the order of the factors is unnecessary because they 
commute with one another, by Lemma 2.22). Thus, if 

_/l 2345678 9\ 

“'^6 4 7 2 5 1 8 9 3/’ 

then a = (1 6) (2 4) (3 7 8 9) (5) and 

a -1 = (5) (9 8 7 3) (4 2) (6 1) 

= (1 6) (2 4) (3 9 8 7). A 

Definition. Two permutations a, fi £ S n have the saute cycle structure if their 
complete factorizations have the same number of /--cycles for each r > 1. 
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According to Exercise 2.21 on page 120, there are 
(1 /r)[n(n - 1) •••(«- r + 1)] 

r-cycles in S n . This formula can be used to count the number of permutations 
having any given cycle structure if one is careful about factorizations having 
several cycles of the same length. For example, the number of permutations in 
54 with cycle structure ( a b)(c d) is 

j [j(4 x 3)] x [i(2 x 1)] = 3, 

the extra factor 4 occurring so that we do not count ( a b)(c d ) = (c d) (a b ) 
twice. Similarly, the number of permutations in S n of the form (a b)(c d)(e f ) 
is 

^3 [» (» - I ) (n - 2) (n - 3)0 - 4)0 - 5)] 

(see Exercise 2.21 on page 120). 

Example 2.29. 


Cycle Structure 

Number 

(1) 

1 

(12) 

6 

(1 2 3) 

8 

(1 2 3 4) 

6 

(12) (3 4) 

3 

24 


Table 2.1. Permutations in S 4 ◄ 


Example 2.30. 


Cycle Structure 

Number 

(1) 

1 

(12) 

10 

(1 2 3) 

20 

(1 2 3 4) 

30 

(1 2 3 4 5) 

24 

(1 2)(3 4 5) 

20 

(12) (3 4) 

15 

120 


Table 2.2. Permutations in 5^ A 
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After a lemma, we present a computational aid. 

Lemma 2.31. Leta, y e S n . For alii, ify: i — »■ j , thenaya~ l : a(i ) — »■ o; (y ). 
Proof. 

aya~ 1 (a(i)) = ay(i) = a(j). • 

Proposition 2.32. Ify , a e S„, t/zcn ayo'" 1 fins t/zc same cycle structure as y. 
In more detail, if the complete factorization of y is 

y = /? 2 ■■■(i j f t , 

then aya~ l is the permutation a which is obtained from y by applying a to the 
symbols in the cycles of y. 

Remark. For example, if y = (1 3) (2 4 7) (5) (6) and o' = (2 5 6) ( 1 4 3), then 

aya~ l = (a\ a3)(a2 a4 al)(a5)(a6) = (4 1) (5 3 7) (6) (2) . ◄ 

Proof If y fixes i, then Lemma 2.31 shows that o fixes a(i). Assume that 
y moves a symbol i, say, y(i) = j, so that one of the cycles in the complete 
factorization of y is 

d j ■■■)■ 

By the definition of a , one of its cycles is 

(a(i) a(j ) . . .); 

that is, o: a{i) a{j). But Lemma 2.31 says that aya~ l : a(i) i->- a{j), so 
that a and aya~ l agree on all numbers of the form a (i ). But every k e X has 
the form k = a(i), because a : X — X is a surjection, and so o = aya~ l . • 

Proposition 2.33. Ify, y' € S n , then y and y' have the same cycle structure if 
and only only if there exists a £ S n with y' = aya~ l . 

Proof Sufficiency has just been proved, in Proposition 2.32. 

Conversely, assume that y and y ' have the same cycle structure; that is, y = 
■■■ ft an d y' = o' i ■ ■ ■ a, are complete factorizations with f, and cj, having 

the same length for all A . <t. Let f) k = (i[, . . . , i'f X) ) and o k = 7j X w )- 

Define 

«0‘i) = it- “(* 2 ) = 72 . • • • > a ^r(k)) = I r(k ) ’ 

for all A. Since fi\ - ■ ■ f> t is a complete factorization, every i e X = {1 n | 

occurs in exactly one (f ; hence, a(i ) is defined for every i e X, and a : X — > X 
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is a (single-valued) function. Since every j e X occurs in some ax, because 
<j\ ■ ■ ■ o t is a complete factorization, it follows that a is surjective. By Exer- 
cise 2.12 on page 102, o' is a bijection, and so a e S n . Proposition 2.32 says that 
aya~ l has the same cycle structure as y and the /.tli cycle, for each X, is 

(aO'i) a(i 2) ■ ■ - a(ir(x))) = 

Therefore, a ya^ 1 = y' . • 


Example 2.34. 

If 

y = (1 2 3) (4 5) (6) and y' = (2 5 6) (3 1) (4), 
then y' = ay a , where 


12 3 4 
2 5 6 3 


5 6 
1 4 


= (1 2 5)(3 6 4). 


Note that there are other choices for a as well. < 

Here is another useful factorization of a permutation. 


Proposition 2.35. Ifn > 2, then every a e S n is a product of transpositions. 

Proof. Of course, (1) = (1 2) (1 2) is a product of transpositions, as is every 
transposition: (i j ) = ( i j)( 1 2) ( 1 2). By Proposition 2.24, it suffices to factor 
an r-cycle /; into a product of transpositions. This is done as follows. If r = 1, 
then f> is the identity, and f = (1 2) (1 2). If r > 2, then 

£ = (12 ... r) = (1 r)(l r — 1) • • • (1 3)(1 2). 

[One checks that this is an equality by evaluating each side. For example, the 
left side £ sends 1 i-a 2; each of (1 r), (1 r — 1), . . . , (1 3) fixes 2, and so the 
right side also sends 1 1 — > 2.] • 

Every permutation can thus be realized as a sequence of interchanges. Such a 
factorization is not as nice as the factorization into disjoint cycles. First of all, the 
transpositions occurring need not commute: (1 2 3) = (1 3) ( 1 2) 7 ^ (1 2) ( 1 3); 
second, neither the factors themselves nor the number of factors are uniquely 
determined. For example, here are some factorizations of (1 2 3) in 54 : 

(12 3) = (1 3) ( 1 2) 

= (2 3) (1 3) 

= (1 3) (4 2) ( 1 2) ( 1 4) 

= (1 3) (4 2) ( 1 2) ( 1 4) (2 3)(2 3). 
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Is there any uniqueness at all in such a factorization? We now prove that the 
parity of the number of factors is the same for all factorizations of a permuta- 
tion a; that is, the number of transpositions is always even or always odd [as is 
suggested by the factorizations of a = (12 3) displayed above]. 

Example 2.36. 

The 15-puzzle consists of a starting position, which is a 4 x 4 array of the num- 
bers between 1 and 15 and a symbol # (which we interpret as “blank”), and 
simple moves. For example, consider the starting position shown below. 


3 

15 

4 

12 

10 

11 

1 

8 

2 

5 

13 

9 

6 

7 

14 

# 


A simple move interchanges the blank with a symbol adjacent to it; for example, 
there are two beginning simple moves for this starting position: either inter- 
change # and 14 or interchange # and 9. One wins the game if, after a sequence 
of simple moves, the starting position is transformed into the standard array 1, 2, 
3, ..., 15,#. 

To analyze this game, note that the given array is really a permutation a of 
[1, 2, . . . , 15, #}; that is, a e Si6- More precisely, if the spaces are labeled 1 
through 15, #, then a(i ) is the symbol occupying the ith square. For example, 
the starting position given above is 

(l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #\ 

^3 15 4 12 10 11 1 8 2 5 13 9 6 7 14 #) ' 

Each simple move is a special kind of transposition, namely, one that moves #. 
Moreover, performing a simple move (corresponding to a special transposition 
r) from a position (corresponding to a permutation /l) yields a new position 
corresponding to the permutation r/T For example, if a is the position above 
and r is the transposition interchanging 14 and #, then ra(#) = r(#) = 14 
and ro!(15) = r(14) = #, while r a(i) = i for all other i. That is, the new 
configuration has all the numbers in their original positions except for 14 and # 
being interchanged. Therefore, to win the game, one needs special transpositions 
Tl,T 2 , ..., r,n so that 

T m ■ ■ ' T2T10! = (1). 

It turns out that there are some choices of a for which the game can be won, but 
there are others for which it cannot be won, as we shall see in Example 2.42. ◄ 

The following discussion will enable us to analyze the 15-game. 
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Lemma 2.37. Ifk, i> 0 and the letters a, b, Ci, d j are all distinct, then 



(a b)(a c\ . 

.. cgb d\ 

. df) = (a c i 

■ c k )(b d\ . 

• • di) 

and 

(a b)(a c\ . . 

■ ■ Ck)(b d\ . 

■ ■ d() = ( a ci . 

. . Ckb d i . . 

. . di). 

Proof. 

The left side of the first asserted equation sends 



a 


Cl Ci; 

Ci 


c,+i i — > C;+ j if i < k; 

Ck 


b i->- a; 

b 


d i i — d \ ; 

dj 


dj + 1 d j+ 1 if j < £ 

de 


a b. 


Similar evaluation of the right side shows that both permutations agree on a, b, 
and all c ; -, dj. Since each side fixes all other numbers in {1,2, . . . , n), if any, 
both sides are equal. 

For the second equation, reverse the first equation, 

{a c\ ... Ck)(b d\ . . . di) = (a b)(a ci . . . Ck b d i ... di), 
and multiply both sides on the left by ( a b): 

(i a b)(a c i . . . Ck)(b d\ . . . di) = (a b)(a b))(a c\ ... Ckb d\ ... d f) 

= (a ci ... Ckb d i ... df). • 


An illustration of the lemma is 

(1 2)(1 3 4 2 5 6 7) = (1 3 4) (2 5 6 7). 

Definition. If a e S n and a = f>\ ■■■('>, is a complete factorization into disjoint 
cycles, then signumr a is defined by 

sgn(a) = (-l)' ! -'. 

Theorem 2.26 shows that sgn is a (single-valued) function, for the number 
t is uniquely determined by a. If e is a 1-cycle, then sgn(e) = 1, for t = n 
and (— 1)° = 1. If r is a transposition, then it moves two numbers, and it fixes 
each of the n — 2 other numbers; therefore, t = 1 + (n — 2) = n — 1, and so 
sgn(r) = (— ly'-C"- 1 ) = — l. 

4 Signum is the Latin word for ‘tnark”or ‘token”; of course, it has become the word sign. 
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Lemma 2.38. If a, x e S n , where x is a transposition, then 

sgn(TO') = — sgn(o'). 

Proof. Let a = f>\ ■ ■ ■ f>, he a complete factorization of a into disjoint cycles, 
and let r = (a b). If a and h occur in the same ft, say, in ft\. then ft\ = 
(a ci . . .Ck b d \ . . . di), where k,l > 0. By Lemma 2.37, 

xfti = (a c i . . .Ck){b d\ . . .df). 

This is a complete factorization of r a = (xft\ )fti ■ ■ ■ ft t , for the cycles in it are 
pairwise disjoint and every number in {1, 2, , . . , n] occurs in exactly one cycle. 
Thus, rft has t + 1 cycles, for xft\ splits into two disjoint cycles. Therefore, 
sgn(raO = ( — l) n- ( ,+1 ) = — sgn(a). 

The other possibility is that a and b occur in different cycles, say, ft\ = 
{a c\ . . .cf) and ft 2 = (b d\ . . .dft), where k,£ > 0. But ra = (xftxftf)^ • ■ • ftt, 
and Lemma 2.37 gives 

tftifti = (a c\...Ckb d\.. .d(f. 

Therefore rot has a complete factorization with t — 1 cycles, and so sgn(ra) = 
( — 1 )«— fr— i> _ _ S g n (o;) ; as desired. • 

Theorem 2.39. For all a, ft & S n , 

sgn(o!/3) = sgn(a) sgn(^). 

Proof. Assume that a e S n is given and that a has a factorization as a product 
of m transpositions: a = r\ ■ ■ ■ x m . We prove, by induction on m, that sgn(o'/;) = 
sgn(o0 sgn(ft) for every ft e S n . The base step m = 1 is precisely Lemma 2.38, 
for m = 1 says that a is a transposition. If m > 1 , then the inductive hypothesis 
applies to t 2 • • • x m , and so 

sgnlo^) = sgn(ti • • • x m ft) 

= — sgn(t 2 • • • x m ft) (Lemma 2.38) 

= — sgn(t 2 • • • r m ) sgn(yS) (by induction) 

= sgn(ti • • • x m ) sgn(ft) (Lemma 2.38) 

= sgn(o!)sgn(^). • 

It follows by induction on k > 2 that 

sgnlaiQ^ • • • a k ) = sg^o^) sgnl^) • • • sgnla/t). 
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Definition. A permutation a e S n is even if sgn(« ) = 1, and a is odd if 
sgn(o') = — 1 . We say that a and f have the same parity if both are even or both 
are odd. 

Let us return to factorizations of a permutation into a product of transposi- 
tions. We saw on page 1 16 that there are many such factorizations of a permu- 
tation, and the only common feature of these different factorizations appeared to 
be the parity of the number of factors. To prove this apparent statement, one must 
show that a permutation cannot be a product of an even number of transpositions 
as well as a product of an odd number of transpositions. 

Theorem 2.40. 

(i) Let a e S n . If a is even, then a is a product of an even number of transpo- 
sitions, and if a is odd, then a is a product of an odd number of transposi- 
tions. 

(ii) If a = T\ • • • Tq = Tj • • • Tp are factorizations into transpositions, then q 
and p have the same parity. 

Proof. 

(i) If a = ti • • • T q is a factorization of a into transpositions, then Theorem 2.39 
gives sgn(o') = sgn(n) • • • sgn(r^) = (— \) q , for we know that every transposi- 
tion is odd. Therefore, if a is even; that is, if sgn(a) = 1, then q is even, while 
if a is odd; that is, if sgn(a) = — 1, then q is odd. 

(ii) If there were two factorizations of a, one into an odd number of transpo- 
sitions and the other into an even number of transpositions, then sgn(cy) would 
have two different values. • 

Corollary 2.41. Let a, f> e S n . If a and f have the same parity, then af is 
even, while if a and ft have distinct parity, then a ft is odd. 

Proof. If sgn(a) = (— I ) q and sgn(/J) = (— I ) p . then Theorem 2.39 gives 
sgn(af>) = (— \) q+p , and the result follows. • 

We return to the 15-game. 

Example 2.42. 

An analysis of the 15-puzzle in Example 2.36 shows that if a e Si 6 is the starting 
position, then the game can be won if and only if a is an even permutation that 
fixes #. For a proof of this, we refer the reader to McCoy and Janusz, Introduction 
to Modern Algebra. The proof in one direction is fairly clear, however. The blank 
# starts in position 16. Each simple move takes # up, down, left, or right. Thus, 
the total number m of moves is u+d + l+r, where u is the number of up moves. 



120 


Groups I Ch. 2 


etc. If # is to return home, each one of these must be undone: there must be the 
same number of up moves as down moves, i.e., u = d, and the same number of 
left moves as right moves, i.e., r = l. Thus, the total number of moves is even: 
m = 2 u + 2 r. That is, if x m ■ ■ ■ x\a = (1), then m is even; hence, a = x\ ■ ■ ■ x m 
(because r -1 = r for every transposition r), and so a is an even permutation. 
Armed with this theorem, one sees that the starting position a in Example 2.36 
is, in cycle notation, 

a = (1 3 4 12 9 2 15 14 7)(5 10) (6 11 13)(8)(#), 

where (8) and (#) are 1-cycles. Now sgn(a) = (— 1) 16 ~ 5 = —1, so that a is an 
odd permutation; therefore, the game starting with a cannot be won. M 

Exercises 

*2.19 Find sgn(a) and a , where 

_ /J 2 3 4 5 6 7 8 9\ 

\9 8765432 \) ' 

2.20 If a e S u fi xes some j , where 1 < j < n (that is, cr(j) = j), defi ne A G Sx = 
S„~ i (where X = {1, . . . , j, . . . , n}) by o' (i) = o(i) for all i =/= j ■ Prove that 

sgn(er') = sgn(er). 

*2.21 (i) If 1 < r < n, prove that there are 

- r \n(n - 1) •••(«- r + 1)] 

r -cycles in S n . 

(ii) If kr < n, where 1 < r < n, prove that the number of a e S n , where a is 
a product of k disjoint r-cycles, is 

- 1 )•••(« - kr + 1).] 

*2.22 (i) If a is an r-cycle, show that a r = (1). 

(ii) If a is an r-cycle, show that r is the smallest positive integer k such that 
a k = (1). 

2.23 Show that an r-cycle is an even permutation if and only if r is odd. 

2.24 Given X = {1, 2, ...,«}, let us call a permutation r of X an adjacency if it is a 
transposition of the form (i i + 1) for i < n. If i < j , prove that (i j ) is a product 
of an odd number of adjacencies. 

*2.25 Defi ne / : {0, 1,2,..., 10} —*■ {0, 1,2,.,.., 10} by 

/ {n) = the remainder after dividing 4n 2 — 3 n 1 by 11. 


(i) Show that / is a permutation. 
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2.26 


(ii) Compute the parity of /. 

(iii) Compute the inverse of /. 

(i) A permutation a e S n is regular if either a has no h xed points and it is 
the product of disjoint cycles of the same length, or a = (1). Prove that 
a is regular if and only if a is a power of an 77-cycle. 

(ii) Prove that if a is an r -cycle, then a k is a product of (r, k ) disjoint cycles, 
each of length 7-/(7-, k). 

(iii) If p is a prime, prove that every power of a /7-cycle is either a /7-cycle 
or (1). 

(iv) How many regular permutations are there in S 5 ? How many regular per- 
mutations are there in Sgl 


*2.28 


*2.27 (i) Prove that if a and /3 are (not necessarily disjoint) permutations that com- 

mute, then (af)) k = a k fi k for all k > 1. 

Give an example of two permutations a and /3 for which (a/3) 2 ^ a 2 ft 2 . 
Prove, for all 7 , that a e S n moves 7 if and only if a -1 moves 7 . 

Prove that if a, ft e S n are disjoint and if a/3 = (1), then a = (1) and 

/S = (1)- 

*2.29 If 77 > 2 , prove that the number of even permutations in S n is 4 - 77 ! . 

2.30 Give an example of a, ft, y e S$, none of which is the identity (1), with a/3 = /Sa 
and ay = ya, but with /S y ^ yft. 

*2.31 If 77 > 3 , show that if a e S n commutes with every /S e S n , then a = (1). 


(ii) 

(i) 

(ii) 


2.3 Groups 

Generalizations of the quadratic formula for finding the roots of cubic and quartic 
polynomials were discovered in the early 1500s. Over the next three centuries, 
many tried to find analogous formulas for the roots of higher-degree polynomi- 
als, but in 1824, N. H. Abel (1802-1829) proved that there is no such formula 
giving the roots of the general polynomial of degree 5. In 1 83 1 , E. Galois (181 1 — 
1832) completely solved this problem by finding precisely which polynomials, 
of arbitrary degree, admit such a formula for their roots. His fundamental idea 
involved his invention of the idea of group. Since Galois’s time, groups have 
arisen in many other areas of mathematics, for they are also the way to describe 
the notion of symmetry, as we will see later in this section and also in Chapter 6. 

The essence of a “product” is that two things are combined to form a third 
thing of the same kind. For example, ordinary multiplication, addition, and sub- 
traction combine two numbers to give another number, while composition com- 
bines two permutations to give another permutation. 

Definition. A (binary) operation on a set G is a function 


* : G x G — > G. 
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In more detail, an operation assigns an element *(x, y) in G to each ordered 
pair (x,y) of elements in G. It is more natural to write x * y instead of *(x, _y); 
thus, composition of functions is the function (/, g) k>- g o /, while multipli- 
cation, addition, and subtraction are, respectively, the functions (x, y) xy, 
(x, _y) i-> x + y, and (x, _y) i-> x — y. The examples of composition and sub- 
traction show why we want ordered pairs, for x * y and y * x may be distinct. 
As any function, an operation is single- valued; when one says this explicitly, it 
is usually called the law of substitution : 

If x = x' and y = y' , then x *y = x' *y' . 

Definition. A group is a set G equipped with an operation * and a special 
element e e G, called the identity, such that 

(i) the associative law holds: for every a, b,c e G, 

a * (b * c) = (a * b) * c\ 

(ii) e * a = a for all a e G; 

(iii) for every a e G, there is a' e G with a' *a = e. 

By Proposition 2.13, the set Sx of all permutations of a set X, with compo- 
sition as the operation and I x as the identity, is a group (the symmetric group 
on X). 

We are now at the precise point when algebra becomes abstract algebra. In 
contrast to the concrete group S„ consisting of all the permutations of the set X = 
{1,2 ,...,«} under composition, we will be proving general results about groups 
without specifying either their elements or their operation. Thus, products of 
elements are not explicitly computable but are, instead, merely subject to certain 
rules. It will be seen that this approach is quite fruitful, for theorems now apply 
to many different groups, and it is more efficient to prove theorems once for all 
instead of proving them anew for each group encountered. For example, the next 
proposition and three lemmas give properties that hold in every group G. In 
addition to this obvious economy, it is often simpler to work with the “abstract” 
viewpoint even when dealing with a particular concrete group. For example, we 
will see that certain properties of S n are simpler to treat without recognizing that 
the elements in question are permutations (see Example 2.52). 

Definition. A group G is called abelian 5 if it satisfies the commutative law: 

x * y = y * x holds for every x, y e G. 

5 This term honors N. H. Abel who proved a theorem, in 1827, equivalent to there being 
a formula for the roots of a polynomial if its Galois group is commutative. This theorem is 
virtually forgotten today, because it was superseded by a theorem of Galois around 1830. 
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The groups S„, for n > 3, are not abelian because (1 2) and (1 3) are 
elements of S„ that do not commute: (1 2) ( 1 3) = (1 3 2) and (1 3) (1 2) = 
(1 2 3). 

We prove some basic facts before giving more examples of groups. 

How does one multiply three numbers? Given the expression 2x3x4, 
for example, one can first multiply 2x3 = 6, and then multiply 6x4 = 24. 
Alternatively, one can first multiply 3x4= 12 and then multiply 2x12 = 24; 
of course, the two answers agree because multiplication of numbers is associa- 
tive. Not all operations are associative, however. For example, subtraction is not 
associative: if c 0, then 


a — (b — c) (a — b) — c. 

More generally, how does one multiply three elements a * b * cl Since one can 
only multiply two elements, there is a choice: multiply b * c to get a new element 
of G, and now multiply this new element by a to obtain a * {b * c); or, one 
can multiply a *b and then multiply this new element by c to obtain (a *b) * c. 
Associativity says that both products are the same, a*(b*c) = ( a*b)*c , and so it 
is unambiguous to write a*b*c without parentheses. The next lemma shows that 
some associativity carries over to products with four factors (that associativity 
allows us to dispense with parentheses for all products having n > 3 factors is 
proved in Theorem 2.49). 

Lemma 2.43. If * is an associative operation on a set G, then 
( a * b) * (c * d) = [a * (b * c)]* d 


for all a, b, c, d £ G 

Proof If we write g = a*b, then {a*b)*(c*d) = g*(c*d) = (g*c)*d = 
[(a * b) * c] * d = [a * (b * c)] * d. • 

Lemma 2.44. If G is a group and a £ G satisfies a * a = a, then a = e. 

Proof There is a' £ G with a' * a = e. Multiplying both sides on the left by a ' 
gives a' * (a * a) = a' * a. The right side is e, and the left side is a' * (a * a) = 
{a’ * a) * a = e * a = a , and so a = e. • 

Proposition 2.45. Let G be a group with operation * and identity e. 

(i) a * a' = e for all a £ G. 

(ii) a * e = a for all a £ G. 

(iii) Ifeo £ G satisfies e^* a = a for all a £ G, then eo = e. 
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(iv) Let a € G. Ifb e G satisfies b * a = e, then b = a' . 

Proof. 

(i) We know that a' * a = e, and we now show that a * a' = e. By Lemma 2.43, 

{a * a') * (a * a') = [a * (a' * a)]* a 1 
= (a * e) * a' 

= a * (e * a') 

= a * a' . 


By Lemma 2.44, a * a' = e. 

(ii) We use part (i). 

a*e=a* (a'* a) = (a* a') *a=e*a=a. 

Therefore, a * e = a. 

(iii) We now prove that a group has a unique identity element; that is, no other 
element in G shares its defining property e * a = a for all a € G. If eo * a = a 
for all a e G, then we have, in particular, eo * eo = eo- By Lemma 2.44, eo = e. 

(iv) In part (i), we proved that if a 1 * a = e, then a * a' = e. Now 

b = b*e = b*(a*a') 

= (b * a) * a') = e * a' = a . • 

In light of part (iii) of the proposition, for each a e G, there is exactly one 
element a' e G with a' * a = e. 

Definition. If G is a group and a e G, then the unique element a' e. G such 
that a' * a = e is called the inverse of a, and it is denoted by a~ [ . 

Here are three more properties holding in all groups. 

Lemma 2.46. Let G be a group. 

(i) The cancellation laws hold : if a,b, x £ G, and either x * a = x * b or 
a * x = b * x, then a = b. 

(ii) (a -1 )” 1 = a for all a € G. 

(iii) If a, b e G, then 

(a * fi) _1 = b~ [ * a -1 . 

More generally, for all n > 1, 

(a\ * a 2 * ■ ■ ■ * a n ) ' = a n ^ * ■ ■ ■ * a, * * a^ ' . 
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Proof. 

(i) 


a = e * a = (x 1 * x) * a = x 1 * (x * a) 

= x _1 * (x * b) = (x -1 *x)*b = e*b = b. 

A similar proof, using x * x _1 = e, works when x is on the right. 

(ii) By Proposition 2.45(i), we have a * a~ l = e. But uniqueness of inverses, 
Proposition 2.45(iv), says that (a~ 1 ) _1 is the unique x e G such that x*a _1 = e. 
Therefore, (a -1 ) -1 = a. 

(iii) By Lemma 2.43, 

(a * b) * {b~ l * a~ l ) = [a * {b * b~ { )] * a~ { = (a * e) * a~ { = a * a -1 = e. 

Hence, (a * b)~ l = b~ [ * a~ l , by Proposition 2.45(iv). The second statement 
follows by induction on n >2. • 

In the proofs just given, we have been very careful about justifying every 
step and displaying all parentheses, for we are only beginning to learn the ideas 
of group theory. As one becomes more adept, however, the need for explicitly 
writing all such details lessens. This does not mean that one is allowed to become 
careless; it only means that one is growing. Of course, you must always be 
prepared to supply omitted details if your proof is challenged. 

From now on, we will usually denote the product a *b in a group by ab (we 
have already abbreviated a o (5 to af in symmetric groups), and we will denote 
the identity by 1 instead of by e. When a group is abelian, however, we will 
often use additive notation. Here is the definition of group written in additive 
notation. 

An additive group is a set G equipped with an operation + and an identity 
element OeG such that 

(i) a + (b + c) = (o + b) + c for every a, b,c £ G; 

(ii) 0 + a = a for all a e G; 

(iii) for every a e G, there is — a e G with (—a) + a = 0. 

Note that the inverse of a, in additive notation, is written —a instead of a -1 . 
We now give too many examples of groups (and there are more!). Glance 
over the list and choose several that look interesting to you. 


Example 2.47. 
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(i) We remind the reader that Sx, the set of all permutations of a set X, is a 
group under composition. In particular, S n , the set of all permutations of 
X = {1, 2, . . . , »}, is a group. 

(ii) The set Z of all integers is an additive abelian group with a*b = a+b, with 
identity e = 0, and with the inverse of an integer n being —n. Similarly, 
one can see that Q, K, and C are additive abelian groups. 

(iii) The set Q x of all nonzero rationals is an abelian group, where * is ordinary 
multiplication, the number 1 is the identity, and the inverse of r e Q x is 
1 /r. Similarly, M x is a multiplicative abelian group. We show, in the next 
example, that C x is also a multiplicative group. 

Note that Z x is not a group, for none of its elements (aside from ±1) 
has a multiplicative inverse in Z x . 

(iv) The nonzero complex numbers C x form an abelian group under multi- 
plication. It is easy to see that multiplication is an associative operation 
and that 1 is the identity. Here is the simplest way to find inverses. If 
z = a + ib e C, where a. b € K, define its complex conjugate z = a — ib. 
Note that zz = a 2 + b 2 , so that z ^ 0 if and only if zz 7^ 0. If z 7^ 0, then 

z _1 = l/z = z/zz. = (a/zz) - ( b/zz)i ■ 

(v) The circle S 1 of radius 1 with center the origin can be made into a mul- 
tiplicative abelian group if we regard its points as complex numbers of 
modulus 1 . The circle group is defined by 

5 1 = {z e C : |z| = 1}, 

where the operation is multiplication of complex numbers; that this is an 
operation on S 1 follows from Corollary 1.20. Of course, complex multi- 
plication is associative, the identity is 1 (which has modulus 1), and the 
inverse of any complex number of modulus 1 is its complex conjugate, 
which also has modulus 1. Therefore, 5 1 is a group. Even though S 1 is an 
abelian group, we still write it multiplicatively, for it would be confusing 
to write it additively. 

(vi) For any positive integer n , let 

T„ = {?* :0<* <n) 

be the set of all the nth roots of unity, where 

£ = e ln 'l n = cos(27t/«) + i sin(2n/n). 

The reader may use De Moivre’s theorem to see that F,, is an abelian group 
with operation multiplication of complex numbers; moreover, the inverse 
of any root of unity is its complex conjugate. 
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(vii) The plane f x M is an additive abelian group with operation vector addi- 
tion; that is, if v = ( x , y ) and v' = (pc' , y'), then v + v' = (x + x' , y + y'). 
The identity is the origin O = (0, 0), and the inverse of v = (x, y) is 
-v = ( x, -y). 

(viii) The parity group V has two elements, the words “even” and “odd,” with 
operation 

even + even = even = odd + odd 


and 

even + odd = odd = odd + even. 

The reader may show that V is an abelian group. 

(ix) Let X be a set. Recall that if A and B are subsets of X, then their symmetric 
difference is A + B = (A — B)U (B — A) (symmetric difference is pictured 
in Figure 2.6). The Boolean group B (X) [named after the logician G. 
Boole (1815-1864)] is the family of all the subsets of X equipped with 
addition given by symmetric difference. 

It is plain that A + B = B + A, so that symmetric difference is com- 
mutative. The identity is 0, the empty set, and the inverse of A is A itself, 
for A + A = 0. (See Exercise 2.3 on page 100.) Thus, B(X) is an abelian 
group. 4 

Example 2.48. 


(i) A (2 x 2 real) matrix 6 A is 


A = 


a 

b 


c 

d 


where a, b,c,d el If 



then the product AB is defined by 


a w + cx ay + cz 
bw + dx by + dz 

6 The word matrix (derived from the word meaning ‘toother”) means ‘Womb” in Latin; 
more generally, it means something that contains the essence of a thing. Its mathematical 
usage arises because a 2 x 2 matrix, which is an array of four numbers, completely describes 
a certain type of function R 2 — x R 2 called a linear transformation (more generally, larger 
matrices contain the essence of linear transformations between higher-dimensional spaces). 


AB = 


a c 
b d 


w 

x 
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The elements a, b, c, d are called the entries of A. Call {a, c) the first row 
of A and call (b, d) the second row; call (a, b) the first column of A and 
call ( c , d) the second column. Thus, each entry of the product AB is a dot 
product of a row of A with a column of B. The determinant of A, denoted 
by det(A), is the number ad — be, and a matrix A is called nonsingular if 
det(A) 7^ 0. The reader may calculate that 

det(AB) = det(A) det(B), 


from which it follows that the product of nonsingular matrices is itself 
nonsingular. The set GL(2, R) of all nonsingular matrices, with operation 
matrix multiplication, is a (nonabelian) group, called the 2x2 real general 
linear group : the identity is the identity matrix 


and the inverse of a nonsingular matrix A is 


A” 1 


d/A -c/A 
—b/A a/ A 


where A = ad — be = det(A). (The proof of associativity is routine, 
though tedious; a “clean” proof of associativity can be given once one 
knows the relation between matrices and linear transformations [see Corol- 
lary 4.71].) 

(ii) The previous example can be modified in two ways. First, we may allow 
the entries to lie in Q or in C, giving the groups GL(2, Q) or GL(2, C). We 
may even allow the entries to be in Z, in which case GL(2, Z) is defined 
to be the set of all such matrices with determinant ±1 (one wants all the 
entries of A" 1 to be in Z). For readers familiar with linear algebra, all 
nonsingular n x n matrices form a group G L ( n . R) under multiplication. 

(iii) All special 1 orthogonal matrices, that is, all matrices of the form 


A = 


cos a 
sin a 


— sin a 
cos a 


form a group denoted by 50(2. R). called the 2x2 special orthogonal 
group. Let us show that matrix multiplication is an operation on 50 (2, R). 
The product 



cos a 

— sin a 

cos P 

— sin p 


sin o' 

COS O' 

sin p 

COS P 


7 The adjective special applied to a matrix usually means that its determinant is 1. 
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is 

coso! cos f — sin a sin f —[cos a sin f + sin a cos f ] 
sin a cos f + cos a sin f cos a cos f — sin a sin f 

The addition theorem for sine and cosine shows that this product is again 
a special orthogonal matrix, for it is 

cosier + f) — sinla + f) 
sinfa + f) cosier + f) 

In fact, this calculation shows that 50(2, R) is abelian. It is clear that 
the identity matrix is special orthogonal, and we let the reader check that 
the inverse of a special orthogonal matrix (which exists because special 
orthogonal matrices have determinant 1 ) is also special orthogonal. 

In Exercise 2.67 on page 166, we will see that 50(2, R) is a disguised 
version of the circle group 5 1 , and that this group consists of all the rota- 
tions of the plane about the origin. 

(iv) The affine 8 group Aff( 1 , R) consists of all functions R — > R (called affine 
maps) of the form 

fa,b(x ) = ax + b, 

where a and b are fixed real numbers with a f=- 0. Let us check that 
Aff(l, R) is a group under composition. If f c ,d(x) = cx + d, then 

fa.bfc.dix) = fa.bicx + d) 

= a ( cx + d) + b 
= acx + (ad + b) 

= fac,ad+b(x). 

Since ac f 0, the composite is an affine map. The identity function 
Ik: R R is an affine map (1 r = /i,o), while the inverse of f a j> is 
easily seen to be f a \ _ a i h . The reader should note that this composition 
is reminiscent of matrix multiplication. 


a b 

c 

d~ 


ac 

ad + b 

_° K 

_0 

1 


_0 

1 


Similarly, replacing R by Q gives the group Aff(l , Q), and replacing R by 
C gives the group Aff ( 1 . C) . ◄ 

8 Projective geometry involves enlarging the plane (and higher-dimensional spaces) by ad- 
joining ‘points at infi nity.” The enlarged plane is called the projective plane, and the original 
plane is called an affi ne plane. Affi ne functions are special functions between affi ne planes. 
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The following discussion is technical, and it can be skipped as long as the 
reader is aware of the statement of Theorem 2.49. Informally, this theorem says 
that if an operation is associative, then no parentheses are needed in products 
involving n > 3 factors. 

An n-expression is an n- tuple (a\, a 2 , . . . , a n ) e G x • • • x G (n factors), 
and it yields many elements of G by the following procedure. Choose two adja- 
cent a’s, multiply them, and obtain an (n — 1) -expression: the new product just 
formed and n — 2 original a’s. In this shorter new expression, choose two adja- 
cent factors (either an original pair or an original one together with the new prod- 
uct from the first step) and multiply them. Repeat until a 2-expression (IT, X) 
is reached; now multiply and obtain the element WX in G. Call IT A an ulti- 
mate product derived from the original expression. For example, consider the 4- 
expression (a. b, c, d). Let us multiply ab, obtaining the 3-expression (ab, c, d). 
We may now choose either adjacent pair ab, c or c, d; in either case, multiply 
these and obtain 2-expressions (( ab)c , d) or (ab, cd ). The elements in either 
of these last expressions can now be multiplied to give the ultimate products 
[(ab)c]d or ( ab)(cd ). Other ultimate products derived from (a, b, c , d) arise by 
multiplying be or cd as the first step, yielding (a, be, d) or (a. b, cd). To say that 
an operation is associative is to say that the two ultimate products arising from 
3-expressions (a, b,c) are equal. It is not obvious, even when an operation is 
associative, whether all the ultimate products derived from a longer expression 
are equal. 

Definition. An n-expression (a\ , ai, . . . , a n ) needs no parentheses if all ulti- 
mate products derived from it are equal; that is, no matter what choices are made 
of adjacent factors to multiply, all the resulting products in G are equal. 

Theorem 2.49 (Generalized Associativity). Ifn > 3, then every n-expression 
(a 1 , a 2 , . . . , a n ) in a group G needs no parentheses. 

Remark. Note that neither the identity element nor inverses will be used in the 
proof. Thus, the hypothesis of the theorem can be weakened by assuming that G 
is only a semigroup ; that is, G is a nonempty set equipped with an associative 
binary operation. A 

Proof. The proof is by (the second form of) induction. The base step n = 3 
follows from associativity. For the inductive step, consider 2-expressions of G 
obtained from an n-expression (a \ . 02 , ... , a„) after two series of choices: 

(W, X) = (a \ - • • ai, aj+i • • • a n ) and (Y, Z) = (a\ • • • aj , a • • • a n ). 

We must prove that WX = FZ in G. By induction, each of the elements W = 
a 1 • • • ai, X = o,_|_i • • • a, 1 , Y = a\ ■ ■ ■ aj, and Z = aj + \ ■ ■ ■ a n , is the (one 
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and only!) ultimate product from m -expressions with m < n. Without loss of 
generality, we may assume that / < j- If i = j, then the inductive hypothesis 
gives W = Y and X = Z in G, and so WX = YZ, as desired. 

We may now assume that i < j . Let A be the ultimate product from the 
/-expression (a i . . . . , a/), let B be the ultimate product from the expression 
(a l+ 1 , . . ,,a j), and let C be the ultimate product from the expression a /+ i • ■ • a n . 
The group elements A, B, and C are unambiguously defined, for the inductive 
hypothesis says that each of the shorter expressions yields only one ultimate 
product. Now W = A, for both are ultimate products from the /-expression 
(«i, . . . , af), Z = C [both are ultimate products from the (/? — //-expression 
(aj + 1 , . . . , a n )\. X = BC [both are ultimate products from the (/?— //-expression 
(g, + i, . . . , a H )\. and Y = AB [both are ultimate products from the /-expression 
(a\, . . . ,cij)\. We conclude that WX = A(BC ) and YZ = ( AB)C , and so 
associativity, the base step n = 3, gives WX = YZ, as desired. • 

Definition. If G is a group and if a e G, define the powers 9 a", for n > 1, 
inductively: 

a 1 = a and a n+{ = aa n . 

Define a 0 = 1 and, if n is a positive integer, define 

a~ n = 

We let the reader prove that = (a”) , this is a special case of the equa- 

tion in Lemma 2.46(iii/. 

There is a hidden complication here. The first and second powers are fine: 
o 1 = a and a 2 = aa. There are two possible cubes: we have defined a 3 = 
aa 2 = a(aa), but there is another reasonable contender: (a a) a = a 1 a. If one 
assumes associativity, then these are equal: 

a = aa" = a(aa) = ( aa)a = a~a. 

Generalized associativity shows that all powers of an elements are unambigu- 
ously defined. 

9 The terminology x square and x cube for x 2 and x 3 is, of course, geometric in origin. 
Usage of the word power in this context goes back to Euclid, who wrote, ‘The power of a 
line is the square of the same line” (from the fi rst English translation of Euclid, in 1570, by 
H. Billingsley). ‘Power” was the standard European rendition of the Greek dunamis (from 
which dynamo derives). However, contemporaries of Euclid, e.g., Aristotle and Plato, often 
used dunamis to mean amplifi cation, and this seems to be a more appropriate translation, for 
Euclid was probably thinking of a 1-dimensional line sweeping out a 2-dimensional square. (I 
thank Donna Shalev for informing me of the classical usage of dunamis.) 
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Corollary 2.50. If G is a group, if a £ G, and ifm, n > 1, then 

a m +n =a m a n an(J {a m ) n =a m,p 

Proof. Both a m+n and a'" a" arise from the expression having m+n factors each 
equal to a; in the second instance, both (a m )" and a mn arise from the expression 
having mn factors each equal to a. • 

It follows that any two powers of an element a in a group commute: 


The proofs of the various statements in the next proposition, while straight- 
forward, are not short. 


Proposition 2.51 (Laws of Exponents). Let G be a group, let a, b e G, and 
let m and n be ( not necessarily positive ) integers. 

(i) If a and b commute, then ( ab) n = a"b". 

(ii) (a n ) m = a mn . 

(iii) a m a n = a m+n . 

Proof. Exercises for the reader. • 

The notation a n is the natural way to denote a*a*- ■ -*a if a appears n times. 
However, if the operation is +, then it is more natural to denote a + a + • • • + a 
by na. Let G be a group written additively; if a, b e G and m and n are (not 
necessarily positive) integers, then Proposition 2.51 is usually rewritten: 

(i) nia + b) = na + nb. 

(ii) m{na) = ( mn)a . 

(iii) ma + na = ( m + n)a. 

Example 2.52. 

Suppose a deck of cards is shuffled, so that the order of the cards has changed 
from 1, 2, 3, 4, . . ., 52 to 2, 1, 4, 3, . . ., 52, 51. If we shuffle again in the same 
way, then the cards return to their original order. But a similar thing happens for 
any permutation a of the 52 cards: if one repeats a sufficiently often, the deck is 
eventually restored to its original order. One way to see this uses our knowledge 
of permutations. Write a as a product of disjoint cycles, say, a = ft i (h • • • ftt, 
where ft i is an r,- -cycle. Now ft/ft = (1) for every i , by Exercise 2.22 on page 120, 
and so ft^ = (1), where k = r\ ■ ■ -r r . Since disjoint cycles commute, Exer- 
cise 2.27 on page 121 gives 

«* = (fti • • • ft,) k = ft\ • • • ftt = (1). 
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Here is a more general result with a simpler proof (abstract algebra can be 
easier than algebra): if G is a finite group and a e G, then a k = 1 for some 
k > 1. We use the argument in Lemma 2.23(i). Consider the sequence 

l,o, a 2 , ..., a" , ... . 

Since G is finite, there must be a repetition occurring in this sequence: there are 
integers m > n with a m = a n , and hence 1 = a m a~ n = a m ~ n . We have shown 
that there is some positive power of a equal to 1. Our original argument that 
a k = (1) for a permutation a of 52 cards is not worthless, for Proposition 2.54 
will show that we may choose k to be the lcm(ri , . . . , r t ). ◄ 

Definition. Let G be a group and let a e G. If a k = 1 for some k > 1, then the 
smallest such exponent k > 1 is called the order of a; if no such power exists, 
then one says that a has infinite order. 

The argument given in Example 2.52 shows that every element in a finite 
group has finite order. In any group G, the identity has order 1, and it is the only 
element in G of order 1 ; an element has order 2 if and only if it is not the identity 
and it is equal to its own inverse. The matrix A = [ ( * } ] in the group GL(2, R) 
has infinite order, for A k = [ q k ] 7^ [ q j 1 ] for all k > 1 . 

Lemma 2.53. Let G be a group and assume that a € G has finite order k. If 
a" = 1, then k \ n. In fact, [n eZ : a n = 1 } is the set of all the multiples ofk. 

Proof. It is easy to see that I = [n e Z : a n = 1} c Z satisfies the hypotheses 
of Corollary 1.34. 

(i) : 0 e 1 because a 0 = 1. 

(ii) : If n, m e 7, then a n = 1 and a m = 1, so that a n ~ m = a”a~ m = 1; hence, 
n — m e 7. 

(iii) : If n e 7 and q e Z, then a” = 1 and a qn = {a' l ) q = 1; hence, qn e 7. 

Therefore, 7 consists of all the multiples of k, where k is the smallest positive 
integer in 7. But the smallest positive k in 7 is, by definition, the order of a. 
Therefore, if a" = 1, then n e 7, and so n is a multiple of k. • 

What is the order of a permutation in S n 1 

Proposition 2.54. Let a e S n . 

(i) If a is an r -cycle, then a has order r. 

(ii) If a = f \ • • • f t is a product of disjoint r, -cycles ft, then a has order 
m = lcm(ri, . . . , r t ). 
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(iii) If p is a prime, then a has order p if and only if it is a p-cycle or a product 
of disjoin t p-cycles. 

Proof. 

(i) This is Exercise 2.22(i) on page 120. 

(ii) Each /l, has order r,-, by (i). Suppose that a M = (1). Since the commute, 

(1) = a M = (fi ■ ■ ■ f t ) M = ■ ■ ■ P^ 1 . By Exercise 2.28(h) on page 121, 

disjointness of the P’s implies that jif = (1) for each i, so that Lemma 2.53 
gives r, | M for all i\ that is, M is a common multiple of r \, . . . , r t . But if 
m = lcm(ri, . . . , r t ), then it is easy to see that a' n = (1). Hence, a has order m. 

(iii) Write a as a product of disjoint cycles and use (ii). • 

For example, a permutation in S n has order 2 if and only if it is either a 
transposition or a product of disjoint transpositions. 

We can now augment the table in Example 2.30. 


Cycle Structure 

Number 

Order 

Parity 

(1) 

1 

1 

Even 

(1 2) 

10 

2 

Odd 

(12 3) 

20 

3 

Even 

(1 2 3 4) 

30 

4 

Odd 

(1 2 3 45) 

24 

5 

Even 

(1 2) (3 4 5) 

20 

6 

Odd 

(1 2) (3 4) 

15 

120 

2 

Even 


Table 2.3. Permutations in S 5 


Symmetry 

We now present a connection between groups and symmetry. What do we mean 
when we say that an isosceles triangle A is symmetric? Figure 2.10 shows 
A = A ABC with its base AB on the x-axis and with the y-axis being the 
perpendicular-bisector of AB. Close your eyes; let A be reflected in the y-axis 
(so that the vertices A and B are interchanged); open your eyes. You cannot tell 
that A has been reflected; that is, A is symmetric about the y-axis. On the other 
hand, if A were reflected in the x-axis, then it would be obvious, once your eyes 
are reopened, that a reflection had taken place; that is, A is not symmetric about 
the x-axis. Reflection is a special kind of isometry. 
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Definition. An isometry of the plane is a function : JR 2 — > M 2 that is distance 
preserving: for all points P = (a, b ) and Q = (c, d) in R 2 , 

\\<p(P) ~ <p(Q)\\ = WP-Ql 

where \\P — Q\\ = -J (a — c) 2 + (b — d) 2 is the distance from P to Q. 

Let P ■ Q denote the dot product : 

P ■ Q = ac + bd. 


Now 


(P ~ Q) • (P ~ Q) = P ■ P -2{P ■ Q) + Q ■ Q 

= ( a 2 + b 2 ) — 2 (ac + bd) + (c 2 + d 2 ) 

= ( a 2 — lac + c 2 ) + ( b 2 — 2 bd + d 2 ) 

= (a — c) 2 + (b — d) 2 

= \\P-Q\\ 2 - 

It follows that every isometry cp preserves dot products: 

<P(P ) ' <P(Q) = P ■ Q, 

because 

cp{P) ■ <p(Q) = || cp(P) - <p(Q ) || 2 = \\P ~ Gil 2 = P ■ Q. 
Recall the formula giving the geometric interpretation of the dot product: 


P-Q = \\P\\ II <2 II cos 0, 
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where 0 is the angle between P and Q. It follows that every isometry preserves 
angles. In particular, P and Q are orthogonal if and only if P ■ Q = 0, and 
so isometries preserve perpendicularity. Conversely, if ip preserves dot products, 
that is, if (p(P)-(piQ) = P ■ Q, then the formula (P — Q) ■ (P — Q) = \\P — Q\\ 2 
shows that <p is an isometry. 

We denote the set of all isometries of the plane by Isom® 2 ); its subset 
consisting of all those isometries <p with ip(O) = O is called the orthogonal 
group of the plane, and it is denoted by O 2 ®). We will see, in Proposition 2.59, 
that both Isom® 2 ) and O 2 ®) are groups under composition. 

We introduce some notation to help us analyze isometries. 

Notation. If P and Q are distinct points in the plane, let L[P, Q ] denote the line 
they determine, and let PQ denote the line segment with endpoints P and Q. 

Here are some examples of isometries. 


Example 2.55. 


(i) Given an angle 0, rotation Rq about the origin O is defined as follows: 
Re(O) = O: if P 7 ^ O, draw the line segment PO in Figure 2.11, rotate 
it 0 (counterclockwise if 6 is positive, clockwise if 6 is negative) to OP', 
and define Rq(P) = P' . Of course, one can rotate about any point in the 
plane. 




(ii) Reflection pi in a line i, called its axis, fixes each point in i; if P £ t, then 
pi(P) = P', as in Figure 2.12 (£ is the perpendicular-bisector of P P'). If 
one pretends that the axis i is a mirror, then P' is the mirror image of P . 
Now pi e IsomfM 2 ); if l passes through the origin, then p f e CM®-)- 
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(iii) Given a point V, translation 10 by Vis the function ty : R 2 — »■ R 2 defined 
by r v(U) = U + V . Translations lie in Isom(R 2 ); a translation ty fixes 
the origin if and only if V = O, so that the identity is the only translation 
which is also a rotation. ◄ 

Lemma 2.56. If (p is an isometry of the plane, then distinct poin ts P , Q, R in 
R 2 are collinear if and only ifcp(P), <p(Q), (p{R ) are collinear. 

Proof Suppose that P, Q, R are collinear. Choose notation so that R is be- 
tween P and Q; hence, ||i> - Q\\ = \\P - J?|| + || R - Q ||. If <p(P), <p(Q), <p(R) 
are not collinear, then they are the vertices of a triangle. The triangle inequality 
gives 

II <P(P) ~ <P(Q ) II < I W(P) ~ «P(*)II + II <P(R) ~ ^(Oil- 

contradicting (p preserving distance. A similar argument proves the converse. If 
P, Q, R are not collinear, then they are the vertices of a triangle. If <p(P), (p(Q), 
( p(R ) are collinear, then the strict inequality displayed above now becomes an 
equality, contradicting cp preserving distance. • 

Every isometry tp is an injection. If P Q, then ||P — <2|| 7 ^ 0, so that 
|| (p(P) — (p(Q ) || = \\P — Q\\ f 1 O', hence, <p(P) 7 ^ <p(Q). It is less obvious that 
isometries are surjections, but we will soon see that they are. 

Proposition 2.57. Every rotation cp fixing the origin is a linear transformation. 

Proof Let Cp = {Q e Mr : | Q — 0|| = d) be the circle of radius d > 0 having 
center 0. We claim that (p{Cd) ^ Cp. If P e Cp, then ||P — 0|| = d\ since <p 
preserves distance, d = || <p(P) — <p(0)|| = \\<p(P) — 0||; thus, (p{P) e Cp- 

Let P fi=- O be a point in M 2 , and let r e R If ||P — 0|| = p, then 
|| rP — 0|| = |r|p. Hence, rP e L[0, P] H C\ r \ p , where Cj r |p is the circle 
with center O and radius |r|p. Since <p preserves collinearity, by Lemma 2.56, 
<p{L[0 , P ] H Cj r |p) c L[0 , cp{P)] H C\ r \p\ that is, cp(rP ) = ±r(p(P) (for a line 
intersects a circle in at most two points). 

If we eliminate the possibility cp(rP) = —rcp(P), then we can conclude that 
( p(rP ) = r(p(P). In case r > 0 , the origin O lies between — rP and P . and so 
the distance from — rP to P is rp + p. On the other hand, the distance from 
rP to P is [r/; — p (if r > 1, the distance is rp — p: if 0 < r < 1, then the 
distance is p — pr). But r + rp 7 ^ | rp — p |, and so tp(rP) f=- — r<p(P ) (because 
( p preserves distance). A similar argument works in case r < 0. 

10 The word translation comes from the Latin word meaning ‘to transfer.” It usually means 
passing from one language to another, but here it means a special way of moving each point 
to another. 
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It remains to prove that ip ( P + Q) = <p( P ) + (p(Q). If O, P, Q are collinear, 
then choose a point U on the line L[0, P ] whose distance to the origin is 1. 
Thus, P = pU , Q = qU , and P + Q = (p + q)U . The points O = cp(O), 
ip(U), (pi P ). <p(Q) are collinear. Since ip preserves scalar multiplication, we 
have 


(p(P) + (p{Q) = cp(pU) + ip(qU) 
= p<p(U) + q<p{U) 
= (p + q)(p(U) 

= <p((p + q)u ) 

= (p(P + Q). 



If O, P, Q are not collinear, then P + Q is given by the parallelogram law: 
P + Q is the point S such that O, P, Q, S are the vertices of a parallelogram. 
Since <p preserves distance, the points O = <p(U), (p(P ), <p(Q), <p(S) are the 
vertices of a parallelogram, and so <p(S) = <p(P) + cp(Q ). But S = P + Q, and 
so (p(P + Q) = <p{P) + <p(Q), as desired. • 

Corollary 2.58. Every isometry tp : R 2 — > M 2 is a bijection, and every isometry 
fixing 0 is a nonsingular linear transformation. 

Proof. Let us first assume that (p fixes the origin: <p( 0) = 0. By Proposi- 
tion 2.57, ip is a linear transformation. Since ip is injective, P = ip(e\), Q = 
ip(e 2 ) is a basis of R 2 , where e\ = (1, 0), e^ = (0, 1) is the standard basis of ]R 2 . 
It follows that the function fr : R 2 — M 2 , defined by f : aP +bQ 1 -^ ae \ +be 2 , 
is a (single-valued) function, and that f and ip are inverse functions. Therefore, 
ip is a bijection, and hence it is nonsingular. 

Suppose that ip is any isometry, so that tp( 0) = U. Now T-u o ip: 0 
U 1 — y 0, so that r_j/ o ip = 9, where 9 is a nonsingular linear transformation. 
Therefore, ip = xp o 9 is a bijection, being the composite of bijections. • 
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We will study Isom® 2 ) more carefully in Chapter 6. In particular, we will 
see that all isometries are either rotations, reflections, translations, or a fourth 
type, glide -reflections. 

Definition. The orthogonal group O 2 ®) is the set of all isometries of the 
plane which fix the origin. 

Proposition 2.59. Both Isom® 2 ) and CL®) are groups under composition. 

Proof. We show that Isom® 2 ) is a group. Clearly, I is an isometry, so that 
1 r € Isom® 2 ). Let ip' and <p be isometries. For all points P and Q, we have 

W<P)(P)) - ®V)(2))II = II <p'{<p{P)) - <p'(<p(Q )) II 

= \\<p(P) - <p(Q)\\ 

= WP-QW, 

and so <p'<p is also an isometry; that is, composition is an operation on Isom® 2 ). 
If (p e Isom® 2 ). then (p is a bijection, by Corollary 2.58, and so it has an inverse 
(p~ l . Now ip~ ] is also an isometry: 

\\P - 211 = \\<P(cp~ l (P)) - </9(</® 1 (2))ll = II® 1 ®) - ^" 1 (2)ll. 

Therefore, Isom® 2 ) is a group, for composition of functions is always associa- 
tive, by Lemma 2.6. 

The reader may adapt this proof to show that CL®) is also a group. • 

Corollary 2.60. If O, P . Q are noncollinear points, and if (p and if are isome- 
tries of the plane such that cp(P) = if(P) and (p(Q ) = if(Q), then <p = if. 

Proof. Since O, P, Q are noncollinear points, the list P. Q is linearly inde- 
pendent in the vector space JR 2 . Since dim® 2 ) = 2, this is a basis [see Corol- 
lary 4.24(h)], and any two linear transformations that agree on a basis are equal 
(see Corollary 4.62). • 

Let us return to symmetry. 

Example 2.61. 

If A is a triangle with vertices P, Q, U and if (p is an isometry, then <p( A) is the 
triangle with vertices <p(P), <p(Q), <p(U ). If we assume further that <p( A) = A, 
then < p permutes the vertices P . Q, U (see Figure 2.14). Assume that the center 
of A is O. If A is isosceles (with equal sides PQ and PU), and if f)f is the 
reflection with axis I = L[0, P], then pi( A) = A (we can thus describe p? 
by the transposition ( Q U), for it fixes P and interchanges Q and U ); on the 
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other hand, if A is not isosceles, then pp{ A) 7^ A, where i' = L[0 , Q]. If A 
is equilateral, then pi"(A) = A, where l" = L\(). 17] and pf (A) = A [we can 
describe these reflections by the transpositions ( P U) and ( P Q ), respectively]; 
these reflections do not carry A into itself when A is only isosceles. Moreover, 



Figure 2.14 Equilateral, Isosceles, Scalene 


the rotation about O by 120° and 240° also carry A into itself [these rotations 
can be described by the 3-cycles ( P Q U) and ( P U Q )]. We see that an 
equilateral triangle is “more symmetric” than an isosceles triangle, and that an 
isosceles triangle is “more symmetric” than a triangle A that is not even isosceles 
[for such a triangle, (p( A) = A implies that (p = 1]. ◄ 

Definition. The symmetry group £(£2) of a figure £2 in the plane is the set of 
all isometries (p of the plane with (p(Q) = £2. The elements of £(£2) are called 
symmetries of £2. 

It is straightforward to see that £(£2) is always a group. 

Example 2.62. 

(i) A regular 3-gon nj, is an equilateral triangle, and £(713)! = 6, as we saw 
in the previous example. 

(ii) Let 7T4 be a square (a regular 4-gon) having vertices [i>o, ifi, t’2, i'a } ; draw 
7T4 in the plane so that its center is at the origin O and its sides are parallel 
to the axes. It is easy to see that every (p e £ (tu) permutes the vertices; 
indeed, a symmetry <p of 714 is determined by {^(u, ) : 0 < i < 3], and so 
there are at most 24 = 4! possible symmetries. Not every permutation in 
54 arises from a symmetry of JT 4 , however. If and vj are adjacent, then 
|| Vi — vj || = 1, but || uo — ^2 1| = s /2 = || iq — 1)3 1|; it follows that ip must 
preserve adjacency (for isometries preserve distance). There are only eight 
symmetries of 714 (this is proved in Theorem 2.63). Aside from the identity 
and the three rotations about O by 90°, 180°, and 270°, there are four 
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reflections, respectively, with axes L[i>o, tu ]. L\v i, vj ]. the x-axis, and the 
y-axis. The group E ( 774 ) is called the dihedral group with 8 elements, and 
it is denoted by D%. 



Figure 2.15 Symmetries of 714 Figure 2.16 Symmetries of JT5 


(iii) The symmetry group £( 715 ) of a regular pentagon its having vertices 
wo, . . ., 14 and center O has 10 elements: the rotations about the origin 
by (72 j)°, where 0 < j < 4, as well as the reflections with axes L\0, Vk I 
for 0 < k < 4 (Theorem 2.63 shows that there are no other symmetries). 
The symmetry group E (tts ) is called th & dihedral group with 10 elements, 
and it is denoted by Diq. ◄ 

The symmetry group E ( u n ) of a regular polygon rc n with center at O and 
vertices vq,v\,..., v n -\ is called the dihedral 11 group Dm- However, we give 
a definition that does not depend on geometry. 

Definition. A group Djn with exactly 2 n elements is called a dihedral group 
if it contains an element a of order n and an element b of order 2 such that 
bab = a~ l . 

If n = 2, then a dihedral group D 4 is abelian; if n > 3, then /> 2 « is not 
abelian. Exercise 2.62 on page 166 shows that there is essentially only one dihe- 
dral group with 2 n elements (more precisely, any two such are isomorphic). 

1 1 F. Klein was investigating those fi nite groups occurring as subgroups of the group of 
isometries of R 3 . Some of these occur as symmetry groups of regular polyhedra [from the 
Greek poly meaning ‘fnany” and hedron meaning ‘two-dimensional side.” He invented a 
degenerate polyhedron that he called a dihedron , from the Greek word di meaning ‘two,” 
which consists of two congruent regular polygons of zero thickness pasted together. The 
symmetry group of a dihedron is thus called a dihedral group 
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Theorem 2.63. The symmetry group £ (tt„) of a regular n-gon u n is a dihedral 
group with 2 n elements. 

Proof. Let it n have vertices vq, . . . , n„_i and center (). Define a to be the 
rotation about O by (360/«)° 


| G+i if 0 
I’o if i = n — 1 


It is clear that a has order n. Define b to be the reflection with axis L\0 , no]; 
thus, 

Hv,)=\ vo iti = ° 

j v n -i if 1 < i < n — 1. 


It is clear that b has order 2. There are n distinct symmetries l, a, a 2 , . . . , a" -1 
(because a has order n), and b. ab , a 2 b . . . . , a n ~ l b are all distinct as well (by 
the cancellation law). If a s = a r b, where 0 < r < n — 1 and s = 0,1, then 
a s (vi ) = a r b(vi) for all i. Now o 1 (i>o) = v s while a r b( no) = n r _i; hence, 
s = r — 1. If i = 1, then a r-1 (i>i) = v r+ \ while a r b{v\) = v r -\ . Therefore, 
a s f a' b for all r, s, and we have exhibited 2 n distinct symmetries in £ (jt,, ). 

We now show that there are no other symmetries of n n . We may assume 
that Ttfj has its center 0 at the origin, and so every symmetry (p fixes O ; that is, 
cp is a linear transformation (by Proposition 2.57). The vertices adjacent to vo, 
namely, v\ and v n -\, are the closest vertices to pq ; that is, if 2 < / < n — 2, 
then || u; — noil > ll^t — Poll- Therefore, if <p(v o) = Vj, then cp(v i) = vj + \ or 
<p(v\) = Vj- 1 . In the first case, a J (v o) = ip(v o) and a J (nf) = ip(v i), so that 
Corollary 2.60 gives cp = a J . In the second case, a J b(v o) = vj and a J b(v\) = 
Vj- 1 , and Corollary 2.60 gives (p = a J b. Therefore, £ (u n ) \ = 2 n. 

We have shown that £(7T„) is a group with exactly 2 n elements and which 
contains elements a and b of orders n and 2, respectively. It remains to show that 
bab = a~ l . By Corollary 2.60, it suffices to evaluate each of these on no and uj . 
But bab(vo) = v n -\ = « _1 (po) an d bab(v i) = no = a _1 (Pt)- • 

Symmetry arises in calculus when describing figures in the plane. We quote 
from Edwards and Penny, Calculus and Analytic Geometry, 3d ed., 1990, p. 456, 
as they describe different kinds of symmetry that might be enjoyed by a curve 
with equation f(x, y ) = 0. 

(i) Symmetry about the x -axis: the equation of the curve is unaltered when y 
is replaced by — y. 

(ii) Symmetry about the y-axis: the equation of the curve is unaltered when x 
is replaced by —x. 
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(iii) Symmetry with respect to the origin : the equation of the curve is unaltered 
when x is replaced by — x and y is replaced by —y. 

(iv) Symmetry about the 45° line y = x: the equation is unaltered when x and 
y are interchanged. 

In our language, their first symmetry is p x , reflection with axis x-axis, the 
second is p y , reflection with axis y-axis, the third is R\m), rotation by 180°, and 
the fourth is p/ , where L is the 45° line. One can now say when a function 
of two variables has symmetry. For example, a function /(x, y) has the first 
type of symmetry if /(x, y) = f(x, —y). In this case, the graph V of the equa- 
tion f(x,y ) = 0 [consisting of all the points (a, b) for which f(a,b) = 0] is 
symmetric about the x-axis, for (a, b) e T implies (a, —b) € F. 


Exercises 


2.32 (i) Compute the order, inverse, and parity of 

a = (1 2) (4 3) ( 1 3 5 4 2)(1 5)(1 3) (2 3). 


(ii) What are the respective orders of the permutations in Exercises 2.19 and 
2.25 on page 120? 

2.33 (i) How many elements of order 2 are there in S 5 and in Sp? 

(ii) How many elements of order 2 are there in S n ? 

*2.34 Let v be a group element of order m; if m = pt for some prime p, prove that y 1 
has order p. 

*2.35 Let G be a group and let a e G have order pk for some prime p, where k > 1. 
Prove that if there is x e G with x p = a, then the order of x is p 2 k, and hence x 
has larger order than a. 

2.36 Let G = GL(2, 0), and let 


A = 


0 

1 


-1 

0 


and 



1 

1 


Show that A 4 = 7 = 5 6 , but that ( AB )" / I for all n > 0. Conclude that AB can 
have inti nite order even though both factors A and B have fi nite order (this cannot 
happen in a fi nite group). 

2.37 (i) Prove, by induction on k > 1, that 


cos (9 

— sin 8 

k 

cos kO 

— sin kd 

sin0 

cos 8 


sin kd 

cos kd 


(ii) Find all the elements of fi nite order in 50(2, M), the special orthogonal 
group [see Example 2.48(iii)]. 

*2.38 If G is a group in which x 2 = 1 for every x e G, prove that G must be abelian. 
[The Boolean groups B ( X ) of Example 2.47(ix) are such groups.] 
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2.39 Let Gbeafi nite group in which every element has a square root; that is, for each 
x G G, there exists y e G with y 2 = x. Prove that every element in G has a unique 
square root. 

*2.40 If G is a group with an even number of elements, prove that the number of elements 
in G of order 2 is odd. In particular, G must contain an element of order 2. 

2.41 What is the largest order of an element in S„, where n = 1,2,..., 10? 

*2.42 The stochastic 12 group £(2, K) consists of all those matrices in GL(2, M) whose 
column sums are 1; that is, £(2, K) consists of all the nonsingular matrices [bd] 
with a + b= l= c + d. [There are also stochastic groups £(2, Q) and £(2, C).] 
Prove that the product of two stochastic matrices is again stochastic, and that the 
inverse of a stochastic matrix is stochastic. 

2.43 Show that the symmetry group E(C) of a circle C is infi nite. 

*2.44 Prove that every element in a dihedral group D 2 n has a unique factorization of the 
form a'b ] , where 0 < i < n and j = 0 or 1. 

2.45 Let ei = (1, 0) and e 2 = (0, 1), If tp is an isometry of the plane fixing O, let 
<p(e 1 ) = (a, b), cp(e 2 ) = (c, d), and let A = \bd\ Prove that det(A) = ±1. 


2.4 Subgroups and Lagrange’s Theorem 

A subgroup of a group G is a subset which is a group under the same opera- 
tion as in G. The following definition will help to make this last phrase precise. 

Definition. Let * be an operation on a set G, and let S c G be a subset. We 
say that S is closed under * if x * y € S for all x,y € S. 

The operation on a group G is a function *: G x G -> G. If 5 c G, then 
S x S c G x G, and to say that S is closed under the operation * means that 
*(,S x S) C S. For example, the subset Z of the additive group Q of rational 
numbers is closed under +. However, if Q x is the multiplicative group of nonzero 
rational numbers, then Q x is closed under multiplication, but it is not closed 
under + (for example, 2 and —2 lie in Q x , but their sum —2 + 2 = 0^ Q x ). 

Definition. A subset H of a group G is a subgroup if 

(i) 1 € H- 

(ii) if x, y e H, then xy e H ; that is, H is closed under *; 

(iii) if.r e H, then* -1 e H. 

12 The term stochastic comes from the Greek word meaning ‘to guess.” Its mathematical 
usage occurs in statistics, and stochastic matrices fi rst arose in the study of certain statistical 
problems. 
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We write H < G to denote H being a subgroup of a group G. Observe 
that {1} and G are always subgroups of a group G, where {1} denotes the subset 
consisting of the single element 1. We call a subgroup H of G proper if H / G, 
and we write H < G. We call a subgroup H of G nontrivial if H f= {1}. More 
interesting examples of subgroups will be given below. 

Proposition 2.64. Every subgroup H < G of a group G is itself a group. 

Proof. Axiom (ii) (in the definition of subgroup) shows that H is closed under 
the operation of G; that is, H has an operation (namely, the restriction of the 
operation t:6xG->Gto//x//c6x G ). This operation is associative: 
since the equation ( xy)z = x(yz) holds for all x, y, z e G, it holds, in particular, 
for all x, y, z e H . Finally, axiom (i) gives the identity, and axiom (iii) gives 
inverses. • 

It is quicker to check that a subset H of a group G is a subgroup (and hence 
that it is a group in its own right) than to verify the group axioms for H, for as- 
sociativity is inherited from the operation on G and hence it need not be verified 
again. 

Example 2.65. 

(i) Recall that Isom (R 2 ) is the group of all isometries of the plane. The sub- 
set 0 2 (R), consisting of all isometries fixing the origin, is a subgroup of 
Isom(R 2 ). If Q c M , then the symmetry group E(G) is also a subgroup 
of Isom (R 2 ). If the center of gravity of O exists and is at the origin, then 

E(£2) < 0 2 (Ki- 
ln) The four permutations 

V = {(!)> (1 2) (3 4), (1 3) (2 4), (1 4) (2 3)} 

form a group, because Y is a subgroup of 54 : (1) eV; » 2 = (1) for each 
a € V, and so a -1 = a € V: the product of any two distinct permutations 
in V — { ( 1 ) } is the third one. One calls V the four-group (or the Klein 
group) (Y abbreviates the original German term Vierergruppe). 

Consider what verifying associativity a (be) = ( ab)c would involve: 
there are 4 choices for each of a, b, and c, and so there are 4 3 = 64 
equations to be checked. Of course, we may assume that none is (1), 
leaving us with only 3 3 = 27 equations but, plainly, proving V is a group 
by showing it is a subgroup of 54 is obviously the best way to proceed. 

(iii) If R 2 is the plane considered as an (additive) abelian group, then any line 
L through the origin is a subgroup. The easiest way to see this is to choose 
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a point (a, b) on L and then note that L consists of all the scalar multiples 
( ra , rb). The reader may now verify that the axioms in the definition of 
subgroup do hold for L. A 

One can shorten the list of items needed to verify that a subset is, in fact, a 
subgroup. 


Proposition 2.66. A subset H of a group G is a subgroup if and only if H is 
nonempty and, whenever x , y £ H, then xy -1 £ H. 

Proof If H is a subgroup, then it is nonempty, for 1 e H . If x, y £ H, then 
y _1 e H, by part (iii) of the definition, and so xy -1 e H, by part (ii). 

Conversely, assume that H is a subset satisfying the new condition. Since 
H is nonempty, it contains some element, say, Ii. Taking x = h = y, we see 
that 1 = /7/7" 1 e H , and so part (i) holds. If y e H, then set x = 1 (which we 
can now do because 1 e H), giving y~ 1 = ly -1 e H, and so part (iii) holds. 
Finally, we know that (y -1 ) -1 = y, by Lemma 2.46. Hence, if x, y e H, then 
y -1 e H, and so xy = x(y -1 ) -1 e H. Therefore, H is a subgroup of G. • 

Since every subgroup contains 1, one may replace the hypothesis “H is 
nonempty” in Proposition 2.66 by “1 e H.” 

Note that if the operation in G is addition, then the condition in the proposi- 
tion is that H is a nonempty subset such that x, y £ H implies x — y £ H . 

For Galois, a group was just a subset H of S n that is closed under composi- 
tion; that is, if a, ft £ H, then a/3 £ H . A. Cayley, in 1854, was the first to define 
an abstract group, mentioning associativity, inverses, and identity explicitly. 


Proposition 2.67. A nonempty subset H of a finite group G is a subgroup if and 
only if H is closed under the operation of G; that is, if a, b £ H, then ab £ H. 
In particular, a nonempty subset of S n is a subgroup if and only if it is closed 
under composition. 

Proof Every subgroup is nonempty, by axiom (i) in the definition of subgroup, 
and it is closed, by axiom (ii). 

Conversely, assume that H is a nonempty subset of G closed under the oper- 
ation on G; thus, axiom (ii) holds. It follows that H contains all the powers of its 
elements. In particular, there is some element a £ H, because H is nonempty, 
and a" £ H for all n > 1. As we saw in Example 2.52, every element in G has 
finite order: there is an integer m with a m = 1 ; hence 1 e H and axiom (i) holds. 
Finally, if h £ H and h m = 1, then h~ l = h"‘~ l (for hh m ~ l = 1 = h m ~ l h), so 
that h~ l £ H and axiom (iii) holds. Therefore, H is a subgroup of G. • 
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This last proposition can be false when G is an infinite group. For example, 
the subset N of the additive group Z is closed under +, but it is not a subgroup 
of Z. 


Example 2.68. 

The subset A n of S n , consisting of all the even permutations, is a subgroup be- 
cause it is closed under multiplication: even o even = even. This subgroup of 
S n is called the alternating 13 group on n letters, and it is denoted by A n . ◄ 


Definition. If G is a group and a e G, write 

{a) = { a n : n e Z} = {all powers of a)\ 

{a) is called the cyclic subgroup of G generated by a. 

A group G is called cyclic if there is some a e G with G = (a): in this case 
a is called a generator of G. 

It is easy to see that (a) is, in fact, a subgroup: 1 = a 0 e (a); a n a m = 
a n+m e (a); a -1 e (a). Example 2.47(vi) shows, for every n > 1, that the mul- 
tiplicative group T„ of all nth roots of unity is a cyclic group with the primitive 
nth root of unity £ = e 2jr, / n as a generator. 

A cyclic group can have several different generators. For example, {a) = 

(a -1 ). 


Proposition 2.69. If G = (a) is a cyclic group of order n, then a k is a generator 
of G if and only if gcd(k , n ) = 1. 

Proof If a k is a generator, then a e (a k ), so there is s with a = a ks . Hence, 
a ks ~ l = 1, so that Lemma 2.53 shows that n \ (ks — I ); that is, there is an integer 
t with ks — 1 = tn, or sk — tn = 1. Hence, ( k , n) = 1, by Exercise 1.51 on 
page 52. 

Conversely, if 1 = sk + tn, then a = a sk+tn = a sk (because a ,n = 1), and 
so a e ( a k ). Hence G = (a) < { a k ), and so G = ( a k ). • 

13 The alternating group fi rst arose in studying polynomials. If 

f(x) = (x - ui){x - u 2 ) ■ ■ ■ lx - u n ), 

then the number D — Y\i<j( u i ~ u j) changes sign if one permutes the roots: if a is a 
permutation of • • • , u n ], then it is easy to see that r[(</[ Q ’( I U) — a ( u j)] — AD. 

Thus, the sign of the product alternates as various permutations a are applied to its factors. 
The sign does not change for those a in the alternating group. 
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Corollary 2.70. The number of generators of a cyclic group of order n is 0 (n). 
Proof. This follows at once from the Propositions 2.69 and 1.39. • 

Proposition 2.71. Every subgroup S of a cyclic group G = {a) is itself cyclic. 
In fact, a' n is a generator of S, where m is the smallest positive integer with 
a m € S. 

Proof. We may assume that S is nontrivial; that is, S [I }, for the proposition 
is obviously true when S = {1}. Let I = {m e Z : a m e 5}; it is easy to check 
that I satisfies the conditions in Corollary 1.34. (i): 0 el, for a 0 = 1 e S. 

(ii) : If m, n e 7, then a m , a n e .S', and so a m a~ n = a m ~ n e S ; hence, m — n e 7. 

(iii) : If m e I and i e Z, then a m e S , and so ( a m Y = a’ m e 5; hence, 

im e I. Since S {1}, there is some 1 a q € 5; thus, q e 7 and 7 {0}. 

By Corollary 1.34, if k is the smallest positive integer in 7, then k \ m for every 
mini . We claim that (a k ) = S. Clearly, (a k ) < S. For the reverse inclusion, 
take s e S. Now s = a m for some m, so that m e 7 and m = ki for some i . 
Therefore, s = a m = a ki e ( a k ). • 

Proposition 2.78 will give a number-theoretic interpretation of this last result. 

Proposition 2.72. Let G be a finite group and let a e G. Then the order of a 
is the number of elements in (a). 

Proof. We will use the idea in Lemma 2.23. Since G is finite, there is an 
integer k > 1 with 1, a, a 2 , ... , a k ~ 1 consisting of k distinct elements, while 
1, a, a 2 , . . . , a k has a repetition; hence a k e {1, a, a 2 , ... , o i_1 }; that is, a k = 
a ' for some i with 0 < i < k. If i > 1, then a k = 1, contradicting the original 
list having no repetitions. Therefore, a k = a 0 = 1, and k is the order of a (being 
the smallest positive such k). 

If 77 = {1, a, a 2 , . . . , o 4 ” 1 }, then |77| = k; it suffices to show that 77 = (a). 
Clearly, 77 c (a). For the reverse inclusion, take a 1 e (a). By the division 
algorithm, i = qk + r, where 0 < r < k. Hence a' = a qk+r = a qk a r = 
{a k ) q a r = a r e 77; this gives (a) c 77, and so {a) = 77. • 

Definition. If G is a finite group, then the number of elements in G, denoted 
by |G|, is called the order of G. 

The word “order” is used in two senses: the order of an element a e G and 
the order |G| of a group G. Proposition 2.72 shows that the order of a group 
element a is equal to | (a) \. 

The following characterization of finite cyclic groups will be used to prove 
Theorem 3.122 showing that the multiplicative group of a finite field is cyclic. 
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Proposition 2.73. A group G of order n is cyclic if and only if for each divisor 
d ofn, there is at most one cyclic subgroup of order d. 

Proof Suppose that G = {a) is a cyclic group of order n. We claim that (a n / d ) 
has order d. Clearly, (a n ^ d ) d = a n = 1, and it suffices to show that d is the 
smallest such positive integer. If (a n ^ d )' = 1, then n \ ( n/d)r , by Lemma 2.53. 
Hence, there is some integer s with ( n/d)r = ns, so that r = ds and r > d. 

To prove uniqueness, let C be a subgroup of G of order d; by Proposi- 
tion 2.71, the subgroup C is cyclic, say, C = (. x ). Now x = a m has order d, 
so that 1 = (x m ) d . Hence, n \ md, by Lemma 2.53, and so md = nk for some 
integer k. Therefore, x = a’" = (a"l d ) k , so that C = (x) C (a n i d ). Since both 
subgroups have the same order, however, it follows that C = (a n ^ d ). 

Conversely, define a relation on a group G by a = b if (a) = [b). It is easy 
to see that this is an equivalence relation and that the equivalence class [a] of 
a e G consists of all the generators of C = {a). Thus, we denote [a] by gen(C), 
and 

G= |J gen(C). 

C cyclic 

Hence, n = |G| = Ec |gen(C)|, where the sum is over all the cyclic subgroups 
of G. But Corollary 2.70 gives | gen(C) | = 0(|C|). By hypothesis, G has at 
most one (cyclic) subgroup of any order, so that 

n = ^2 lg en ( c )l < = n, 

C d\n 

the last equality being Corollary 1.28. Therefore, for each divisor d of n, there 
must be a cyclic subgroup C of order d contributing 0(d) to Ec lg en (C)l- In 
particular, there must be a cyclic subgroup C of order n, and so G is cyclic. • 

Here is a way of constructing a new subgroup from given ones. 

Proposition 2.74. The intersection H, of any family of subgroups of a 
group G is again a subgroup of G. In particular, if H and K are subgroups of 
G, then H (T K is a subgroup of G. 

Proof Let D = f|; 6 / Hi ; we prove that D is a subgroup by verifying each of 
the parts in the definition. Note first that D f - 0 because 1 e D since 1 e //,• for 
all i. If x e D. then x got into D by being in each Hp as each H\ is a subgroup, 
x _1 e Hj for all i, and sox' 1 e D. Finally, if x, y e D, then both x and y lie 
in every Hj, hence their product xy lies in every Hi, and so xy e D. • 

Corollary 2.75. If X is a subset of a group G, then there is a subgroup ( X ) of 
G containing X that is smallest in the sense that ( X ) < H for every subgroup 
H of G which contains X. 
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Proof. First of all, note that there exist subgroups of G which contain X\ for 
example, G itself contains X. Define {X) = f] X cH ^ ■ the intersection of all 
the subgroups H of G which contain X. By Proposition 2.74, (X) is a subgroup 
of G; of course, (X) contains X because every H contains X. Finally, if H is 
any subgroup containing X, then H is one of the subgroups whose intersection 
is (X); that is, {X) < H. • 

Note that there is no restriction on the subset X in the last corollary; in 
particular, X = 0 is allowed. Since the empty set is a subset of every set, we 
have 0 c H for every subgroup H of G. Thus, (0) is the intersection of all the 
subgroups of G, one of which is {1}, and so (0) = {1}. 

Definition. If X is a subset of a group G, then (X) is called the subgroup 
generated by X. 

Example 2.76. 

(i) If G = {a) is a cyclic group with generator a, then G is generated by the 
subset X = {a}. 

(ii) The symmetry group E(jr„) of a regular n-gon i r n is generated by a, b. 

where a is a rotation about the origin by (360/«)° and b is a reflection (see 
Theorem 2.63). These generators satisfy the conditions a n = 1, b 2 = 1, 
and bab = a “ 1 , and E ( jr n ) is a dihedral group Djn ■ ◄ 

The next proposition gives a more concrete description of the subgroup gen- 
erated by a subset. 

Definition. Let X be a nonempty subset of a group G. Then a word on X is 
either the identity element or an element of G of the form w = x f 1 x'f ■ ■ ■ x e n " , 
where n > 1, Xj e X for all i, and e t = ±1 for all i. 

Proposition 2.77. If X is a nonempty subset of a group G, then (X) is the set 
of all the words on X. 

Proof We begin by showing that W, the set of all words on X, is a sub- 
group of G. By definition, 1 e If. If w, w' e W, then w = x^xif ■ ■ • x e f 

and w' = }’ i y~i ■ ■ ■ Pm > where yj e X and fj = ±1. Hence, ww' = 
x^Xj 2 ■ ■ ■ Xn" j| 1 yf 1 ■ ■ ■ y/n"\ which is a word on X, and so vow' e W . Finally, 
(u;) -1 = xf e "x n < f‘f' ■ ■ ■ xf ei e W. Thus, X is a subgroup of G, and it clearly 
contains every element of X. We conclude that {X) < W . For the reverse in- 
equality, we show that if S is any subgroup of G containing X, then S contains 
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every word on X. But this is obvious: since 5 is it a subgroup, it contain x e 
whenever x e X and e = ±1, and it contains all possible products of such 
elements. Therefore, W < S for all such 5, and so IT < p| S = (X). • 

Proposition 2.78. Let a and b be integers and let A = (a) and B = (b) be the 
cyclic subgroups ofTL they generate. 

(i) If A + B is defined to be [a + b : a e A and b e B], then A + B = (d), 
where d = gcd(o, b). 

(ii) A C\ B = ( m ), where m = lcm(a, b). 

Proof. 

(i) It is straightforward to check that A + B is a subgroup of Z (in fact, A + B is 
precisely the set of all the linear combinations of a and b). By Proposition 2.71, 
the subgroup A + B is cyclic: A + B = {d), where d can be chosen to be the 
smallest non-negative number in A + B . Thus, d is a common divisor of a and 
b, and it is the smallest such; that is, d = gcd(o, b). 

(ii) If c e A n B, then c e A and a \ c; similarly, if c e A n B. then c e B 

and b \ c. Thus, every element in A n B is a common multiple of a and b. 
Conversely, every common multiple lies in the intersection. By Proposition 2.71, 
the subgroup A Cl B is cyclic: A D B = (m), where m can be chosen to be the 
smallest non-negative number in A Cl B. Therefore, m is the smallest common 
multiple; that is, m = lcm(a, b). • 

Perhaps the most fundamental fact about subgroups H of a finite group G is 
that their orders are constrained. Certainly, we have \ H\ < \ G \ , but it turns out 
that | H | must be a divisor of | G \ . To prove this, we introduce the notion of coset. 

Definition. If H is a subgroup of a group G and a e G. then the coset [4 aH is 
the subset a H of G, where 


aH = [ah : h e H }. 

Of course, a = al e aH . Cosets are usually not subgroups. For example, 
if a (f: H, then 1 £ aH (otherwise 1 = ah for some h e H. and this gives the 
contradiction a = h~ l e H). 

If we use the * notation for the operation in a group G, then we denote the 
coset aH by a * H, where 


a*H = {a *h: he H.) 

l4 The cosets just defined are often called left cosets', there are also right cosets of H. 
namely, subsets of the form Ha = {ha : h e H}\ these arise in further study of groups, 
but we shall work almost exclusively with (left) cosets. 
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In particular, if the operation is addition, then the coset is denoted by 

g H = {a ~\~ h ! h H } . 


Example 2.79. 


(i) Consider the plane M 2 as an (additive) abelian group and let £ be a line 
through the origin O (see Figure 2.17 on page 152); as in Example 2. 65(iii), 
the line £ is a subgroup of R 2 . If /3 e M 2 , then the coset /3 + l is the line 
i' containing f> which is parallel to l, for if ra e i, then the parallelogram 
law gives + ra e l' . 


t 



Figure 2.17 The Coset ft + i 


(ii) If G = S 3 and H = ((1 2)), there are exactly three cosets of H, namely: 

H= {(1), (1 2)} = (1 2)H, 

(1 3 )H = {(1 3), (1 2 3)} = (12 3)//, 

(2 3 )H = {(2 3), (1 3 2)} = (13 2 )H, 

each of which has size 2. ◄ 

Observe, in our examples, that different cosets of a given subgroup do not 
overlap. 

If H is a subgroup of a group G, then the relation on G, defined by 
a = b if a~ l b e H , 

is an equivalence relation on G. If a e G, then a~ l a = 1 e H, and a = a; 
hence, = is reflexive. If a = b, then a~ [ b e H \ since subgroups are closed under 
inverses, (a -1 ^) -1 = b~ l a e H and b = a; hence = is symmetric. If a = b and 
b = c, then a~ ] h, b ~ 1 c e H ; since subgroups are closed under multiplication, 
(cr 1 b)(b~ ] c) = a ~ 1 c e H, and a = c. Therefore, = is transitive, and hence it 
is an equivalence relation. 
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We claim that the equivalence class of a £ H is the coset aH. If x = a, then 
x~ x a £ H, and Lemma 2.80 gives x £ xH = aH. Thus, [a] C aH. For the 
reverse inclusion, it is easy to see that if x = ah £ aH, then x~ [ a = ( ah)~ l a = 
h~ l a~ l a — h £ H, so that x = a and x £ [a]. Fience, aH C [a], and so 
[a] = aH. 


Lemma 2 . 80 . Let H be a subgroup of a group G, and let a, b £ G. 

(i) aH = bH if and only ifb~ l a £ H. In particular, aH = H if and only if 
a G H . 

(ii) IfaH n bH 0, then aH =bH. 

(iii) \aH\ = \H\ for all a £ G. 

Proof. 

(i) This is a special case of Lemma 2.19, for cosets are equivalence classes. The 
second statement follows because H = \H. 

(ii) This is a special case of Proposition 2.20, for the equivalence classes com- 
prise a partition of X. 

(iii) The function / ://—>■ aH, given by /(/;) = ah, is easily seen to be a 

bijection [its inverse aH — > H is given by ah a - (ah ) = h]. Therefore, H 
and aH have the same number of elements, by Exercise 2.1 1 on page 101. • 

Theorem 2.81 (Lagrange’s Theorem). If H is a subgroup of a finite group 
G, then \H\ is a divisor of \G\. 

Proof. Let [a\H, ai_H . . . . , a, H } be the family of all the distinct cosets of H 
in G. Then 

G = a\H U aiH U • • • U a t H, 

because each g £ G lies in the coset gH, and gH = aiH for some i. Moreover, 
Lemma 2.80(h) shows that distinct cosets a\H and ajH are disjoint. It follows 

that 

|G| = \axH\ + \a 2 H\ + --- + \a t H\. 

But | ajH | = \H\ for all i, by Lemma 2.80(iii), so that \G\ = t\H\. • 

Definition. The index of a subgroup H in G, denoted by [G : H], is the number 
of cosets of H in G. 

When G is finite, the index [G : H] is the number t in the formula |G| = 
t\H\ in the proof of Lagrange’s theorem, so that 

I G | = [G : H]\H\. 

This formula shows that the index [G : H] is also a divisor of |G|. 
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Corollary 2.82. If H is a subgroup of a finite group G, then 

[G : H] = \G\/\H\. 

Proof. This follows at once from Lagrange’s theorem. • 

Recall Theorem 2.63: the symmetry group £(7r„) of a regular n- gon is a 
dihedral group of order 2 n. It contains a cyclic subgroup of order n, generated 
by a rotation a, and the subgroup (a) has index [£(7T„) : (a)] = 2. Thus, there 
are two cosets: (a) and b{a), where b is any symmetry outside of (a). 

We now see why the orders of elements in S 5 , displayed in Table 2.3 on 
page 134, are divisors of 120. Corollary 2.144 explains why the number of 
permutations in 55 of any given cycle structure is a divisor of 120. 

Corollary 2.83. If G is a finite group and a e G, then the order of a is a divisor 
of\G\. 

Proof. By Proposition 2.72, the order of the element a is equal to the order of 
the subgroup H = {a}. • 

Corollary 2.84. If a finite group G has order m, then a m = 1 for all a e G. 

Proof. By Corollary 2.83, a has order d, where d \ m : that is, m = dk for some 
integer k. Thus, a' n = a dk = ( a d ) k = 1. • 

Corollary 2.85. If p is a prime, then every group G of order p is cyclic. 

Proof. Choose a e G with a f \ . and let H = (a) be the cyclic subgroup 
generated by a. By Lagrange’s theorem, \H\ is a divisor of |G| = p. Since p is 
a prime and |//| > 1, it follows that |//| = p = |G|, and so H = G. • 

Lagrange’s theorem says that the order of a subgroup of a finite group G is 
a divisor of |G|. Is the “converse” of Lagrange’s theorem true? That is, if d is a 
divisor of | G | , must there exist a subgroup of G having order d‘l The answer is 
“no;” Proposition 2.97 shows that the alternating group A 4 is a group of order 12 
which has no subgroup of order 6. 


Exercises 

2.46 (i) Defi ne the special linear group by 

SL(2, R) = {A e GL(2, R) : det(A) = 1}. 

Prove that SL(2, R) is a subgroup of GL(2, R). 

(ii) Prove that GL(2, Q) is a subgroup of GL(2, R). 
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*2.47 Give an example of two subgroups H and K of a group G whose union II IJ K is 
not a subgroup of G. 

*2.48 Let G be a fi nite group with subgroups II and K. If H < K, prove that 

[G : H] = [G : K][K : H], 

2.49 If H and K are subgroups of a group G and if \H\ and 1^1 are relatively prime, 
prove that H fl K = {1}. 

2.50 Prove that every inti nite group contains infi nitely many subgroups. 

*2.51 Let G be a group of order 4. Prove that either G is cyclic or x 2 = 1 for every 
x G G. Conclude, using Exercise 2.38 on page 143, that G must be abelian. 

2.52 (i) Prove that the stochastic group E (2, E), the set of all nonsingular 2x2 

matrices whose row sums are 1, is a subgroup of GL(2, E) (see Exer- 
cise 2.42 on page 144). 

(ii) Deh ne 21(2, M) to be the set of all nonsingular doubly stochastic matri- 
ces (all row sums are 1 and all column sums are 1). Prove that E'( 2, E) 
is a subgroup of GL(2, E). 

*2.53 Let G be a fi nite group, and let S and T be (not necessarily distinct) nonempty 
subsets. Prove that either G = ST or |G| > |S| + |T|. 

2.54 (i) If {Si : i € 1} is a family of subgroups of a group G, prove that an 

intersection of cosets p| 1€/ x ’ is either empty or a coset of P (e/ S). 

(ii) ( B . H. Neumann.) If a group II is the set- theoretic union of finitely 
many cosets, 

H = xiS) U • • • U x n S n , 

prove that at least one of the subgroups Sj has fi nite index in G. 

2.55 (i) Show that a left coset of ((1 2)) in S 3 may not be equal to a right coset of 

((1 2)) in S 3 ; that is, there is a e S 3 with a((l 2)) ^ ((1 2))o’. 

(ii) Let G be a fi nite group and let H < G be a subgroup. Prove that the 
number of left cosets of H in G is equal to the number of right cosets of 
H in G. 


2.5 Homomorphisms 

An important problem is determining whether two given groups G and H are 
somehow the same. For example, we have investigated 53 , the group of all per- 
mutations ofX = {1,2,3}. The group Sy of all the permutations of Y = [a, b,c] 
is a group different from S 3 because permutations of {1, 2, 3} are different than 
permutations of {a, b, c}. But even though S 3 and 5 are different, they surely 
bear a strong resemblance to each other (see Example 2.86). The notions of 
homomorphism and isomorphism allow one to compare different groups, as we 
shall see. 
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Definition. If (G, *) and (H . o) are groups (we have displayed the operation 
in each), then a function / : G — > H is a homomorphism 15 if 

fix * y) = fix ) o /(y) 

for all x, y e G. If / is also a bijection, then / is called an isomorphism. We say 
that G and H are isomorphic, denoted by G = H, if there exists an isomorphism 
/: G — > //. (In Exercise 2.57 on page 165, we will see that isomorphism is an 
equivalence relation on any family of groups. In particular, if G = H, then 
H = G. 

Two obvious examples of homomorphisms are the identity 1 g : G G, 
which is an isomorphism, and the trivial homomorphism / : G — > H, defined 
by f(a) = 1 for all a e G. 

Here are more interesting examples. Let R be the group of all real numbers 
with operation addition, and let R > be the group of all positive real numbers with 
operation multiplication. The function / : R — » R > , defined by f (x ) = e x , is a 
homomorphism, for if x, y e R, then 

fix + y) = e x+ y = e x e y = f (x)f (y). 

Now / is an isomorphism, for its inverse function g : 1R :> — >• JR is log(x). There- 
fore, the additive group R is isomorphic to the multiplicative group R :> . Note 
that the inverse function g is also an isomorphism: 

g(xy) = log (xy) = log(x) + log(y) = g(x) + g(y). 

As a second example, we claim that the additive group C of complex num- 
bers is isomorphic to the additive group R 2 [see Example 2.47(vii)]. Define 
/: C^R 2 by 

/ : a + ib m>- ( a, b ). 

It is easy to check that / is a bijection; / is a homomorphism because 
/([a + ib] + [ a ' + ib']) = f([a + a '] +i[b + b']) 

= {a + a' ,b + b ') 

= (a, b ) + ( a ' , b ') 

= f(a + ib) + f(a' + ib'). 

Definition. Let a\, 02 , . . . , a n be a list with no repetitions of all the elements 
of a finite group G of order n. A multiplication table for G is an n x n matrix 
whose i j entry is a/aj. 

15 The word homomorphism comes from the Greek homo meaning ‘kame”and morph mean- 
ing ‘khape”or 'form.” Thus, a homomorphism carries a group to another group (its image) of 
similar form. The word isomorphism involves the Greek iso meaning ‘fequal,”and isomorphic 
groups have identical form. 
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G 

d\ 

a, ■ 

d n 

a\ 

d\d\ 

d\dj 

• ’ d\d n 

at 

did\ 

didj 

' ' dj^dyi 

d n 

d n d 1 

• d n d j 

' ' d n d n 


Let us agree, when writing a multiplication table, that the identity element is 
listed first; that is, a\ = 1 . In this case, the first row and first column of the table 
merely repeat the listing above, and so we usually omit them. 

Consider two almost trivial examples of groups: let T 2 denote the multiplica- 
tive group { 1 , — 1 }, and let V denote the parity group [Example 2 . 47 (viii)]. Here 
are their multiplication tables: 


V: 


It is clear that r 2 and V are distinct groups; it is equally clear that there is no 
significant difference between them. The notion of isomorphism formalizes this 
idea; T2 and V are isomorphic, for the function / : IT — »■ V, defined by /(l) = 
even and /(— 1 ) = odd, is an isomorphism, as the reader can quickly check. 

There are many multiplication tables for a group G of order n, one for each of 
the n .! lists of its elements. If a \ , 02, ■ . ■ , a n is a list of all the elements of G with 
no repetitions, and if / : G — > H is a bijection, then f(a 1), f(a 2), . . . , f(a n ) 
is a list of all the elements of H with no repetitions, and so this latter list deter- 
mines a multiplication table for H. That / is an isomorphism says that if we 
superimpose the multiplication table for G (determined by a \ , ai, . . . , a n ) upon 
the multiplication table for H [determined by /(«]). , f(a n )\, then the 

tables match: if a\aj is the i j entry in the given multiplication table of G, then 
f{ai)f{aj ) = f(ajaj) is the ij entry of the multiplication table of H . In this 
sense, isomorphic groups have the same multiplication table. Thus, isomorphic 
groups are essentially the same, differing only in the notation for the elements 
and the operations. 

Example 2.86. 

Here is an algorithm to check whether a given bijection / : G — > H between a 
pair of groups is actually an isomorphism: enumerate the elements a\, . . . , a n of 
G, form the multiplication table of G arising from this list, form the multiplica- 
tion table for H from the list f (a \). . . . , f(a n ), and compare the n 2 entries of 
the two tables one row at a time. 


1 -1 


T\ T 


even 

odd 

odd 

even 
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We illustrate this for G = S 3 , the symmetric group permuting {1,2, 3}, and 
H = Sy, the symmetric group of all the permutations of Y = {a, b,c). First, 
enumerate G: 


(1), (1 2), (1 3), (2 3), (1 2 3), (1 3 2). 

We define the obvious function <p : S 3 — »■ Sy that replaces numbers by letters: 

( 1 ), ( a b), ( a c), ( b c), ( a b c), ( a c b). 

Compare the multiplication table for S 3 arising from this list of its elements with 
the multiplication table for Sy arising from the corresponding list of its elements. 
The reader should write out the complete tables of each and superimpose one on 
the other to see that they match. We will illustrate this by checking the 4,5 entry. 
The 4,5 position in the table for S 3 is the product (2 3)(1 2 3) = (1 3), while the 
4,5 position in the table for Sy is the product ( b c)(a b c) = (a c). 

This result is generalized in Exercise 2.56 on page 165. ◄ 

We now turn from isomorphisms to more general homomorphisms. 

Lemma 2.87. Let f : G — > H be a homomorphism. 

(i) /( l) = l; 

(ii) = /(x) _l ; 

(iii) /(x' ! ) = f(x) n foralln e 7L. 

Proof. 

(i) Applying / to the equation 11 = 1 in G gives the equation /(l)/' (1) = /'(l) 
in H, and multiplying both sides by /(l ) -1 gives /( 1) = 1. 

(ii) Apply / to the equation x _1 x = I in G to obtain the equation f(x~ 1 ) f (x) = 

1 in H. Proposition 2.45(iv), uniqueness of the inverse, gives /(W 1 ) = /(x) . 

(iii) It is routine to prove by induction that f(x n ) = /(x)" for all n > 0. For 
negative exponents, we have (y -1 )" = y~ n for all y in a group, and so 

f(x~ n ) = /((x- 1 )") = /((x -1 ))” = (/(X)- 1 )" = /(X)-". . 


Example 2.88. 

We show that any two finite cyclic groups G and H of the same order m are 
isomorphic. It will then follow from Corollary 2.85 that any two groups of prime 
order p are isomorphic. 

Suppose that G = (x) and H = (y). Define / : G — > H by f(x l ) = y l 
for 0 < i < m. Now G = (1, x, x 2 , . . . , x m-1 } and H = {1, y, y 2 , . . . , y'" -1 }, 
and so it follows that / is a bijection. To see that / is a homomorphism (and 
hence an isomorphism), we must show that / (x'x J ) = f (x 1 ) f (x 1 ) for all i and 
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j with 0 < i, j < in . The desired equation clearly holds if i + j < m, for 
f (x l+ i ) = y l+ J , and so 

/(*'V) = f(x i+ j ) = y i+ j = y' y' = /<.v')/<.v'). 

If i + j > in, then i + j = in + r, where 0 < r < in, so that 

J+j — r m+r _ m r _ r 

A — *x .A- A — A 

(because x m = 1); similarly, y' +J = y r (because y m = 1). Hence 
f(x'x j ) = f(x 1+j ) = f(x r ) 

= f = y' = yiyj = /&)/&). 

Therefore, / is an isomorphism and G = H. (See Example 2.115 for a nicer 
proof of this.) 

A property of a group G that is shared by every other group isomorphic to it 
is called an invariant of G. For example, the order, |G|, is an invariant of G, for 
isomorphic groups have the same order. Being abelian is an invariant [if a and b 
commute, then ab = ba and 

f(a)f (b) = f (ab) = f(ba ) = f(b)f(a)\ 

hence, f{a) and f(b) commute]. Thus, R and GL(2, R) are not isomorphic, 
for R is abelian and GL(2, R) is not. There are other invariants of a group (see 
Exercise 2.59 on page 166); for example, the number of elements in it of any 
given order r, or whether or not the group is cyclic. In general, however, it is a 
challenge to decide whether two given groups are isomorphic. 

Example 2.89. 

We present two nonisomorphic groups of the same order. 

As in Example 2.65(ii), let Y be the four-group consisting of the following 
four permutations: 

V = { ( 1 ) , (1 2) (3 4), (1 3)(2 4), (1 4) (2 3)}, 

and let T4 = (i) = { I , / , — I . — / } he the multiplicative cyclic group of fourth 
roots of unity, where i 2 = —1. If there were an isomorphism / : Y — »■ T^ then 
surjectivity of / would provide some x e Y with i = f(x). But x 2 = (1) for 
all x e V, so that i 2 = f(x ) 2 = f(x 2 ) = /(( 1)) = 1 , contradicting i 2 = — 1 . 
Therefore, Y and T4 are not isomorphic. 

There are other ways to prove this result. For example, T4 is cyclic and Y is 
not, or T4 has an element of order 4 and Y does not, or T4 has a unique element 
of order 2, but V has 3 elements of order 2. At this stage, you should really 
believe that T 4 and Y are not isomorphic ! A 
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Definition. If / : G — > H is a homomorphism, define 
kernel 16 f = {x e G : f(x ) = 1} 

and 

image f = {h e H : h = f (x ) for some x e G}. 
We usually abbreviate kernel /' to ker / and image /' to im /'. 


Example 2.90. 


(i) If T„ = (f), where f = e 27TI,/n is a primitive nth root of unity, then 
/: Z — > r„, given by f (m) = is a surjective homomorphism with 
ker / all the multiples of n. 

(ii) If T 2 is the multiplicative group V 2 = {±1}, then Theorem 2.39 says that 
sgn : S„ — > T 2 is a homomorphism. The image of sgn = {±1}, that is, sgn 
is surjective, because sgn(r) = — 1 for a transposition r; the kernel of sgn 
is the alternating group A n , the set of all even permutations. 

(iii) Determinant is a homomorphism det : GL(2, JR) — »■ M x , the multiplica- 
tive group of nonzero reals. Now imdet = M x , that is, det is surjective, 
because if r e JR X , then r = dct( [ ' () y ]). The kernel of det is the special 
linear group SL(2, JR). [This example can be extended to GL(n, JR); see 
Example 2.48(ii).] 

(iv) We now generalize the construction of ker/'. Recall the definition of in- 
verse image : if / : X — > Y is a function and if B c Y is a subset, then we 
show that 

f- l (B) = {xeX: f(x)eB}. 

If /: G H is a homomorphism and if B < H is a subgroup of H, 
then the inverse image f~ l (B) is a subgroup of G. We have 1 e f~ l (B), 
for /( 1) = 1 e B < H. If x, y e f~ x {B), then fix,), f(y ) e B and 
so f(x)f(y) € B\ hence, f(xy) = f{x)f(y) € B, and xy € f~ l (B). 
Finally, ifx e then f(x) e B: hence, f(x ~ l ) = /(x)” 1 € B and 

x _1 e f~ l (B). In particular, if B = {1}, then f~ l (B) = f~ l ( 1) = ker/'. 
It follows that if / : G — > H is a homomorphism and B is a subgroup of 
H , then /' -1 (B) is a subgroup of G containing ker /. ◄ 


16 Kernel comes from the German word meaning ‘^rain” or ‘feeed” (corn comes from the 
same word). Its usage here indicates an important ingredient of a homomorphism. 
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Proposition 2.91. Let f : G — > H be a homomorphism. 

(i) ker f is a subgroup of G and im f is a subgroup of H. 

(ii) If x e ker / and if a e G, then axa~ l e ker f. 

(iii) f is an injection if and only if ker / = { 1 }. 

Proof 

(i) Lemma 2.87 shows that 1 e ker/, for / (l) = 1. Next, if x, y e ker/, then 
f(x) = 1 = /(y); hence, f(xy) = fix)f{y) = 1 • 1 = 1 , and so xy e ker/'. 
Finally, if x e ker/, then fix) = 1 and so /'(x _1 ) = /(x ) _1 = I ~~ 1 = 1; thus, 
x _1 e ker /, and ker / is a subgroup of G. 

We now show that im / is a subgroup of H. First, 1 = / ( 1 ) e im/. Next, 
if h = fix) e im/, then /z _1 = /(x ) -1 = /(x _1 ) e im/. Finally, if 
k = fiy) e im/, then hk = fix)fiy) = fixy) e im/. Hence, im/' is a 
subgroup of H. 

(ii) Ifx e ker/, then fix) = 1 and 

fiaxa~ l ) = fia)fix)fia)~ l = f(a)lf(a)~ l = fia)fia)~ { = 1; 
therefore, axa~ l e ker /. 

(iii) If /' is an injection, then x 7 ^ 1 implies fix) / /(l) = 1, and so x / 
ker/. Conversely, assume that ker / = {1} and that /(x) = /(y). Then 1 = 
/(x)Z'(y )" 1 = /(xy -1 ), so that xy ” 1 e ker/ = 1 ; therefore, xy -1 = 1 , 
x = y, and /' is an injection. • 

Definition. A subgroup K of a group G is called a normal subgroup if k e K 
and g e G imply gkg~ l e K. If K is a normal subgroup of G, one writes 
K <\G. 

The proposition thus says that the kernel of a homomorphism is always a 
normal subgroup. If G is an abelian group, then every subgroup K is normal, for 
if k e K and geG, then gkg~ l = kgg~ l = k e K. 

The cyclic subgroup H = ((1 2)) of S 3 , consisting of the two elements (1) 
and (1 2), is not a normal subgroup of S 3 : if a = (1 2 3), then a -1 = (3 2 1), 
and 

ail 2)cT 1 = (1 2 3)(1 2) (3 2 1) = (2 3) / H. 

On the other hand, the cyclic subgroup K = ((1 2 3)) of S 3 is a normal subgroup, 
as the reader should verify. 

It follows from Examples 2.90(h) and 2.90(iii) that A „ is a normal subgroup 
of S„ and SL(2, M) is a normal subgroup of GL(2, R) (however, it is also easy to 
prove these facts directly). 
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Definition. If G is a group and a £ G, then a conjugate of a is any element in 
G of the form 

gag -1 * 

where g e G. 

It is clear that a subgroup K < G is a normal subgroup if and only if K 
contains all the conjugates of its elements: if k £ K, then gleg ~ 1 e K for all 
g £ G. In Proposition 2.33, we showed that u. ft £ S„ are conjugate in S n if and 
only if they have the same cycle structure. 

If H Sfi, then ct , ft £ H being conjugate in Sn (that is, ct and ft have 
the same cycle structure) does not imply that a and ft are conjugate in H. For 
example, (1 2) (3 4) and (1 3) (2 4) are conjugate in 54 , but they are not conjugate 
in V because the four-group V is abelian. 

Remark. In linear algebra, a linear transformation T : V — > V , where V is an 
ra-dimensional vector space over M, determines an n x n matrix A if one uses a 
basis of V ; if one uses another basis, then one gets another matrix B from T. It 
turns out that A and B are similar', that is, there is a nonsingular matrix P with 
B = PAP -1 . Thus, conjugacy in GL(n. M) is similarity. ◄ 

Definition. If G is a group and g e G, define conjugation y, A : G —? G by 

Yg(a) = gag _1 

for all a £ G. 

Proposition 2.92. 

(i) If G is a group and g £ G, then conjugation y A : G G is an isomor- 
phism. 

(ii) Conjugate elements have the same order. 

Proof. 

(i) If g, h £ G, then 

(Yg 0 Yh )(a) = Ygihah -1 ) = g{hah~ { )g~ l = ( gh)a(gh)~ l = y gh (a ); 
that is, 

Yg 0 Yh = Ygh- 

It follows that each y g is a bijection, for y g o y ~ 1 = y 1 = 1 = y i; 1 o y g . We 
now show that y g is an isomorphism: if a, b £ G, 

Ygiab) = g(ab)g~ l = (gag~ l )(gbg~ l ) = y g (a)y g (b). 
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(ii) To say that a and b are conjugate is to say that there is g e G with b = 
gag -1 ; that is, b = y g (a). But y g is an isomorphism, and so Exercise 2.59(h) 
on page 166 shows that a and b = y g (a ) have the same order. • 

Example 2.93. 

Define the center of a group G, denoted by Z(G), to be 

Z(G) = [z £ G : zg = gz for all g e G}; 

that is, Z(G) consists of all elements commuting with everything in G. (Note 
that the equation zg = gz can be rewritten as z = gzg - 1 , so that no other 
elements in G are conjugate to z.) 

Let us show that Z(G) is a subgroup of G. Clearly 1 e Z(G), for 1 com- 
mutes with everything. If y, z £ Z(G), then yg = gy and zg = gz for all g e G. 
Therefore, (yz)g = y(zg ) = y(gz) = (yg)z = g(yz), so that yz commutes with 
everything, and yz £ Z(G). Finally, if z £ Z(G), then zg = gz for all g e G; in 
particular, zg -1 = g - 1 z. Therefore, 

gz -1 = (zg -1 ) -1 = (g _ 1 z ) _1 = z _1 g 

(we are using Lemma 2.46: (ab)~ l = b~ l a~ i and (a -1 ) -1 = a). 

The center Z(G) is a normal subgroup: if z £ Z(G) and g £ G, then 

gzg -1 = zgg -1 = z £ Z(G). 

A group G is abelian if and only if Z(G) = G. At the other extreme are groups 
G for which Z(G) = {1}; such groups are called centerless. For example, it is 
easy to see that Z(S^) = {1}; indeed, all large symmetric groups are centerless, 
for Exercise 2.31 on page 121 shows that Z(S n ) = {1} for all n >3. ◄ 

Example 2.94. 

The four-group Y is a normal subgroup of S 4 . Recall that the elements of V are 

V = {(1), (1 2) (3 4), (1 3) (2 4), (1 4) (2 3)}. 

By Proposition 2.32, every conjugate of a product of two transpositions is an- 
other such. But we saw, in Example 2.29, that only 3 permutations in 54 have 
this cycle structure, and so V is a normal subgroup of S 4 . ◄ 

Proposition 2.95. 

(i) If H is a subgroup of index 2 in a group G, then g 2 e H for every g £ G. 

(ii) If H is a subgroup of index 2 in a group G, then H is a normal subgroup 
of G. 
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Proof. 

(i) Since H has index 2, there are exactly two cosets, namely, H and aH, where 
a £ H. Thus, G is the disjoint union G = H U aH . Take g £ G with g £ H, so 
that g = ah for some h £ H. If g 2 ^ H, then g 2 = ah', where h' £ H. Hence, 

g = g~ l g 2 = (ah)~ l ah' = h~ l a~ l ah' = h~ l h' £ H , 
and this is a contradiction. 

(ii) It suffices to prove that if h £ H, then the conjugate ghg~ l £ H for every 
g £ G. Since H has index 2, there are exactly two cosets, namely, H and aH, 
where a g H. Now, either g £ H or g £ aH. If g £ H, then ghg~ l £ H, 
because H is a subgroup. In the second case, write g = ax, where x £ H. Then 
ghg~ l = a(xhx~ l )a~ l = ah'a~ l , where h' = xhx~ l £ H (for h' is a product 
of three elements in H). If ghg~ l £ H, then ghg~ l = ah'a~ [ £ aH-, that is, 
ah'a~ l = ay for some y £ H. Canceling a, we have h'a~ l = y, which gives 
the contradiction a = y~ l h' £ H. Therefore, if h £ H, every conjugate of h 
also lies in H \ that is, H is a normal subgroup of G. • 

Definition. The group of quaternions 17 is the group Q of order 8 consisting of 
the matrices in GL(2, Q 

Q = {/, A, A 2 , A 3 , B, BA, BA 2 , BA 3 }, 
where I is the identity matrix, A = [ _j q], and B = [ 9 i ]. 

The reader should note that the element A £ Q has order 4, so that (A) 
is a subgroup of order 4 and hence of index 2; the other coset is B{A) = 
[B, BA, BA 2 , BA 3 }. 

17 The operations of addition, subtraction, multiplication, and division (by nonzero num- 
bers) can be extended from R to the plane in such a way that all the usual laws of arithmetic 
hold; of course, the plane is usually called the complex numbers C in this context. W. R. 
Hamilton invented a way of extending all these operations from C to four-dimensional space 
in such a way that all the usual laws of arithmetic still hold (except for commutativity of mul- 
tiplication); he called the new ‘humbers” quaternions (from the Latin word meaning ‘four”). 
The multiplication is determined by knowing how to multiply the particular 4-tuples 1, i, j, 
and k: 


i 2 = — 1 = j 2 = k 2 ; 

ij = k; ji = -k; jk = i; kj = -i; 


ki = j: jk= -j. 


All the nonzero quaternions form a multiplicative group, and the group of quaternions is iso- 
morphic to the smallest subgroup (it has order 8) containing these four elements.) 
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Example 2.96. 

In Exercise 2.76 on page 167, the reader will check that Q is a nonabelian group 
of order 8. We claim that every subgroup of Q is normal. Lagrange’s theorem 
says that every subgroup of Q has order a divisor of 8, and so the only possible 
orders of subgroups are 1, 2, 4, or 8. Clearly, the subgroup {1} and the subgroup 
of order 8 (namely, Q itself) are normal subgroups. By Proposition 2.95(h), any 
subgroup of order 4 must be normal, for it has index 2. Finally, the only element 
in Q having order 2 is —I, and so (— I) is the only subgroup of order 2. But 
this subgroup is normal, for if M is any matrix, then M(—I ) = ( —I)M , so that 
M(—I)M~ l = (— I)MM~ l = — I e {—I). [Exercise 2.76 asks you to prove 
that (-/) = Z(Q).] ◄ 

Example 2.96 shows that Q is a nonabelian group which is like abelian 
groups in the sense that every subgroup is normal. This is essentially the only 
such example: every finite group with every subgroup normal has the form Q x A, 
where A is an abelian group of a special form: A = B x C, where every non- 
identity element in B has order 2 and every element in C has odd order ( direct 
products A x B will be introduced in the next section). 

Lagrange’s theorem states that the order of a subgroup of a finite group G 
must be a divisor of | G | . This suggests the question, given some divisor d of | G | , 
whether G must contain a subgroup of order d. The next result shows that there 
need not be such a subgroup. 

Proposition 2.97. The alternating group A4 is a group of order 12 having no 
subgroup of order 6. 

Proof First of all, | A4I = 12, by Exercise 2.29 on page 121. If A4 contains a 
subgroup H of order 6, then H has index 2, and so or e H for every a € A4, 
by Proposition 2.95(i). If a is a 3-cycle, however, then a has order 3, so that 
a = a 4 = (a 2 ) 2 . Thus, H contains every 3-cycle. This is a contradiction, for 
there are 8 3-cycles in A4. • 

Proposition 2.122 will show that if G is an abelian group of order n, then G 
does have a subgroup of order d for every divisor d of n. 


Exercises 

*2.56 If there is a bijection / : X —*■ Y (that is, if X and Y have the same number of 
elements), prove that there is an isomorphism tp: Sx — > Sy. 

*2.57 (i) Show that the composite of homomorphisms is itself a homomorphism. 

(ii) Show that the inverse of an isomorphism is an isomorphism. 

(iii) Prove that isomorphism is an equivalence relation on any family of groups. 
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(iv) Show that two groups that are isomorphic to a third group are isomorphic 
to each other. 

2.58 Prove that a group G is abelian if and only if the function / : G —*■ G, given by 
f(a) = a~ 1 , is a homomorphism. 

*2.59 This exercise gives some invariants of a group G. Let f:G —*■ // be an isomor- 
phism. 

(i) Prove that if a G G has infi nite order, then so does f(a), and if a has 
fi nite order n, then so does f(a). Conclude that if G has an element of 
some order n and H does not, then G ^ //. 

(ii) Prove that if G = H, then, for every divisor k of | G | , both G and H have 
the same number of elements of order k. 

2.60 (i) Show that every group G with | G \ < 6 is abelian. 

(ii) Find two nonisomorphic groups of order 6. 

2.61 Prove that a dihedral group of order 4 is isomorphic to V, the 4-group, and a dihe- 
dral group of order 6 is isomorphic to S3. 

*2.62 Prove that any two dihedral groups of order 2 n are isomorphic. 

*2.63 This exercise is for readers familiar with n x n matrices (see Example 4.65). Defi ne 
a function f:S n —> GL(«, R) by /: a i-»- P a , where P a is the matrix obtained 
from the n x n identity matrix I by permuting its columns by a (the matrix P„ 
is called a permutation matrix ). Prove that / gives an isomorphism of S n and a 
subgroup of GL(«, M). 

2.64 (i) Find a subgroup H < S 4 with // = V but with // / V. 

(ii) Prove that the subgroup H in part (i) is not a normal subgroup. 

2.65 If G is a group and a, b G G, prove that ab and ba have the same order. 

2.66 (i) If / : G —*■ H is a homomorphism and x G G has order k, prove that 

f(x ) G H has order m, where in \ k. 

(ii) If /: G — > H is a homomorphism and if (|G|, |//|) = 1, prove that 
f(x) = 1 for all x e G. 

*2.67 (i) Prove that the special orthogonal group S O (2, M) is isomorphic to the 

circle group S 1 . 

(ii) Prove that all the rotations of the plane about the origin form a group 
under composition that is isomorphic to SCLfK). 

2.68 Let G be the additive group of all polynomials in x with coeffi cients in Z, and let 
H be the multiplicative group of all positive rationals. Prove that G = H. 

*2.69 Show that if H is a subgroup with bH = Hb = {hb : h G H] for every b G G, 
then H must be a normal subgroup. (The converse is proved in Lemma 2.110.) 

2.70 Prove that the intersection of any family of normal subgroups of a group G is itself 
a normal subgroup of G. 

2.71 Defi ne W = ((1 2) (3 4)), the cyclic subgroup of 5) generated by (1 2) (3 4). 
Show that W is a normal subgroup of V, but that W is not a normal subgroup of 
S4. Conclude that normality is not transitive: W < V and V < G need not imply 
W < G. 

*2.72 Let G be a fi nite group written multiplicatively. Prove that if |G| is odd, then every 
x G G has a unique square root; that is, there exists exactly one g G G with g 2 = x. 

2.73 Give an example of a group G, a subgroup H < G, and an element g G G with 
[G : H] = 3 andg 3 $ H. 
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*2.74 Show that the center of GL(2, R) is the set of all scalar matrices 

a Ol 
0 a 


with a /= 0. 

*2.75 Let f = g 27r, /' ! be a primitive nth root of unity, and defi ne A 


(i) Prove that A has order n and that B has order 2. 

(ii) Prove that BAB = A -1 . 


0 


0 

r 1 


and 


(iii) Prove that the matrices of the form A' and BA', for 0 < i < n. form a 
multiplicative subgroup G < GL(2, C). 

(iv) Prove that each matrix in G has a unique expression of the form B' A J , 
where i = 0, 1 and 0 < j < n. Conclude that |G| = 2 n and that 


G = D 2n 


*2.76 Recall that the group of quaternions Q (deli ned in Example 2.96) consists of the 8 
matrices in GL(2, C) 


Q = {/, A, A 2 , A 3 , B , BA, BA 2 , BA 3 }, 


where A = [ _°j l 0 ] and B = [O',]. 

(i) Prove that Q is a nonabelian group with operation matrix multiplication. 

(ii) Prove that —I is the only element in Q of order 2, and that all other 
elements Af / / satisfy M 2 = — I. 

(iii) Show that Q has a unique subgroup of order 2, and it is the center of Q. 

(iv) Prove that {—I) is the center Z( Q). 

*2.77 Prove that the quaternions Q and the dihedral group Dg are nonisomorphic groups 
of order 8. 

2.78 If G is a fi nite group generated by two elements of order 2, prove that G = D 2n 
for some n . 

*2.79 Prove that A 4 is the only subgroup of S 4 of order 12. (In Exercise 2.123 on 
page 205, this will be generalized from S4 to S n for all n > 3.) 

*2.80 (i) Let A be the set of all 2 x 2 matrices of the form A = [g *], where 

a / 0. Prove that A is a subgroup of GL(2, R). 

(ii) Prove that r/r : Aff(l, R) — > A, defi ned by / 1— A, is an isomorphism, 
where f(x) = ax + b. 

(iii) Prove that the stochastic group E (2, R) [see Exercise 2.42 on page 144] is 
isomorphic to the affi ne group Aff( 1 , R) by showing that tp: £ (2, R) —*■ 
A = Aff(l, R), given by cp(M) = QMQ -1 , is an isomorphism, where 


Q 


1 0 
1 1 


and Q 1 


1 0 

-1 1 


2.81 Prove that the symmetry group E(7r„), where n n is a regular polygon with n ver- 
tices, is isomorphic to a subgroup of S n . 
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2.82 An automorphism of a group G is an isomorphism G —*■ G. 

(i) Prove that Aut(G), the set of all the automorphisms of a group G, is a 
group under composition. 

(ii) Prove that y : G — > Aut(G), defi ned by g i— y g (conjugation by g), is a 
homomorphism. 

(iii) Prove that ker y = Z(G). 

(iv) Prove that im y < Aut(G). 

2.83 If G is a group, prove that Aut(G) = {1} if and only if | G | < 2. 

2.84 If C is a fi nite cyclic group of order /?, prove that | Aut(C)| = </>(/?). 


2.6 Quotient Groups 

We are now going to construct a group using congruence mod m . Once this is 
done, we will be able to give a proof of Fermat’s theorem using group theory. 
This construction is the prototype of a more general way of building new groups 
from given groups, called quotient groups. 

Definition. Given m > 2 and a eh, the congruence class of a mod m is the 
subset [a] of Z: 

[a] = {b eh: b = a mod m} 

= [a + km : k e h} 

= {..., a — 2m , a — m, a, a + m, a + 2m , . . . } ; 

the integers mod m, denoted by I m , is the family of all such congruence classes. 18 

For example, if in = 2, then [0] = [b e Z : b = 0 mod 2} is the set of all the 
even integers and [\] = [b eh : b = \ mod 2} is the set of all the odd integers. 
Notice that [2] = (2 + 2 k : k e Z} is also the set of all even integers, so that 
[2] = [0]; indeed, [0] = [2] = [-2] = [4] = [-4] = [6] = [-6] 


Remark. Given m, we may form the cyclic subgroup (m) of Z generated by 
m. The reader should check that the congruence class [a] is precisely the coset 
a + (m). ◄ 

1 8 The two most popular notations for the integers mod m are Z /raZ and Z m . Both notations 
are good ones: the fi rst reminds us that the group is a quotient group of Z, but the notation is 
cumbersome; the second notation is compact, but it causes confusion because it is also used by 
number theorists, when m is a prime p, to denote all the rational numbers whose denominator 
is prime to p (the ring of p-adic fractions). In fact, many number theorists denote the ring of 
p-adic integers by Z p . To avoid possible confusion, I am introducing the notation l m . 
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The notation [a] is incomplete in that it does not mention the modulus nr. 
for example, [1] in 1 2 is not the same as [1] in 1 3 (the former is the set of all odd 
numbers and the latter is {1 + 3 k : k e %} = {... , —2, 1, 4, 7, ...}). This will 
not cause problems, for, almost always, one works with only one I m at a time. 
However, if there is a danger of confusion, as in Theorem 2.126, we will denote 
the congruence class of a in I,„ by [a] m . The next proposition is a special case 
of Lemma 2.19. 


Proposition 2.98. [ a ] = [b] in I,„ if and only if a = b mod m. 

Proof If [a] = [b\, then a e [c/|. by reflexivity, and so a e [a \ = [b]. There- 
fore, a =b mod in. 

Conversely, if c e [ a ], then c = a mod in, and so transitivity gives c = 
b mod in; hence [a] c [b]. By symmetry, b = a mod m, and this gives the 
reverse inclusion [b] c [a |. Thus, [a ] = [b], • 

In words. Proposition 2.98 says that congruence mod in between numbers 
can be converted into equality at the cost of replacing numbers by congruence 
classes. 

In particular, [a \ = [0] in I,„ if and only if a = 0 mod m; that is, [a] = [0] 
in I m if and only if m is a divisor of a. 


Proposition 2.99. Let m >2 be given. 

(i) //a e Z, then [a] = [r] for some r with 0 < r < m. 

(ii) If 0 < r' < r < m, then [r'] 7^ [r]. 

(iii) I m has exactly m elements, namely, [0], [1], . . . , \m — 1]. 

Proof. 

(i) For each a e Z, the division algorithm gives a = qm + r, where 0 < r < nr. 
hence a — r = qm and a = r mod m. Therefore [a] = [r], where r is the 
remainder after dividing a by m. 

(ii) Proposition 1.55(h) gives r' ^kr mod m. 

(iii) Part (i) shows that every [a] in I„, occurs on the list [0], [1], [2], ... , [in — 1]; 
part (ii) shows that this list of m items has no repetitions. • 

We are now going to make I,„ into an abelian group by equipping it with 
an addition. Now Proposition 2.98 says that [a] = [b] in I„, if and only if 
a = b mod in, so that each \a \ e I m has many names. The operation we propose 
to define on I m will appear to depend on choices of names, and so we will be 
obliged to prove that the operation is well-defined. 
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Lemma 2.100. Ifm > 2, then the function a : I m x I,„ — > I m , given by 

«([«], [b\) = [a + b]. 


is an operation on I m . 

Proof. The operation appears to depend on choosing names [a] and [ b | : what 
if we had chosen names [ a '] and [b']2 To see that a is a (well-defined) function, 
we must show that if [a] = [o'] and [b] = [b'], then a ([a], [ b ]) = a([a'], [b' ]), 
that is, [a + b] = [a' + b']. But this is precisely Proposition 1.57(i). • 

Proposition 2.101. I m is an additive cyclic group of order m with generator 

[ 1 ]. 

Proof 

In this proof only, we shall write EEI for addition of congruence classes: 
a([a], [b]) = [a] EB [ b ] = [a + b]. 

Associativity of the operation EB follows from associativity of ordinary addi- 
tion: 


[a] EB ([fi] EB [c]) = [a] EB [b + c] 

= [a + (b + c)] 

= [{a + b) + c] 

= [a + b] EB [c] 

= ([«] EB [b]) ffl [c]. 

Commutativity of the operation EB follows from commutativity of ordinary addi- 
tion: 

[o] EB [/;] = [a + b] = [b + a] = [/;] ffl [a]. 

The identity element is [0]: since 0 is the (additive) identity in Z, 

[0] EB [a] = [0 + a] = [a]. 

The inverse of [a] is [—a]; since —a is the additive inverse of a in Z, 

[—a] EB [a] = [—a + a] = [0], 

Therefore, I m is an abelian group of order m; it is cyclic with generator [1], 
for if 0 < r < m, then [r] = [l] + - — + [l]is the congruence class obtained by 
adding [1] to itself r times. • 
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The reader should notice that the group axioms in I,„ are “inherited” from 
the group axioms in Z. 

Here is an alternative construction of the group I,„. Define G m to be the set 
{0, 1, . . . , m — 1}, and define an operation on G m by: 

m , \a + b if a + b < m — 1; 

a mb = \ 

I a + b — m if a + b > m — 1 . 

Although this definition is simpler than what we have just done, proving associa- 
tivity is now is very tedious. It is also more awkward to use, for proofs usually 
require case analyses (for example, see Example 2.88). 

We now drop the notation EE; henceforth, we shall write 

[a] + [b] = [a + b ] 

for the sum of congruence classes in l m . 

Corollary 2.102. Every cyclic group of order m > 2 is isomorphic to I m . 

Proof. We have already seen, in Example 2.88, that any two finite cyclic groups 
of the same order are isomorphic. • 

We now focus on multiplication. 

Proposition 2.103. The function pc : I m x I m — »■ I„„ given by 

M([o]. [b]) = [ab], 

is an operation on I m . This operation is associative and commutative, and [1] is 
an identity element. 

Proof. The operation appears to depend on choosing names [a] and [ h \ : what 
if we had chosen names [ a '] and [b']l To see that /x is a (well-defined) function, 
we must show that if [a] = [o'] and [ b ] = [b'f then ji([a\, [ b ]) = /x([fl'], [b'f), 
that is, [ab] = [a'b']. But this is precisely Proposition 1.57(ii). 

In this proof only, we are going to write IE for multiplication of congruence 
classes: 

/x([o], [b]) = [a] IE [/;] = [ab]. 

Associativity of the operation EE follows from associativity of ordinary mul- 
tiplication: 

[a] EE ([ b ] EE [c]) = [a] E [be] 

= [a (be)] 

= [ (ab)c] 

= [ab] IE [c] 

= ([a]M[b])M[c]. 
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Commutativity of IE follows from commutativity of ordinary multiplication: 

[a] E [ b ] = [ ab ] = [ba] = [b] E [a]. 

The identity element is [1], because 

[1] E [a] = [la] = [a] 

for all a e Z . • 

We now drop the notation E; henceforth, we shall write 

[a][b] = [ab] 

for the product of congruence classes in I m instead of [a | E [ b \ . Note that I m is 
not a group under multiplication because some elements, e.g., [0], do not have 
inverses. 

Proposition 2.104. 

(i) If (a, m ) = 1, then [a][x| = [b] can be solved for [x] in I m . 

(ii) If p is a prime , then I*, the set of nonzero elements in I p , is a multiplicative 
abelian group of order p — 1. 

Proof 

(i) By Theorem 1.65, the congruence ax = b mod m can be solved for x if 
(a, m) = 1; that is, [a][x] = [b] can be solved for [x ] in I m when a and m are 
relatively prime. (Recall that if sa + tm = 1, then [x \ = [sfc].) 

(ii) Assume that m = p is prime; if 0 < a < p, then (a, p) = 1 and the equation 

[a][x] = [1] can be solved in I p , by part (iii); that is, [a] has an inverse in I p . 
We have proved that I* is an abelian group; its order is p — I because, as a set, 
it is obtained from I p by throwing away one element, namely, [0]. • 

In Theorem 3.122 we will prove, for every prime p, that I* is a cyclic group 
of order p — 1 . 

We now give a new proof of Fermat’s theorem which is entirely different 
from our earlier proof, Theorem 1.61. 

Corollary 2.105 {Fermat). If p is a prime and a e Z, then 

a p = a mod p. 

Proof. By Proposition 2.98, it suffices to show that [a p ] = [a] in I p . If [a] = 
[0], then Proposition 2.103 gives [a p ] = [a] p = [0] p = [0] = [a]. If [a] 

[0], then [a] e 1^, the multiplicative group of nonzero elements in I p . By 
Corollary 2.84 to Lagrange’s theorem, [o] p_1 = [1], because |I^| = p — 1. 
Multiplying by [a \ gives the desired result [a p ] = [a] p = [a]. Therefore, a p = 
a mod p. • 
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Note that if m > 2 is not a prime, then (I m ) x is not a group: if m = ab, 
where 1 < a,b < m. then [a], [b] e (I m ) x , but their product [a][b] = [ab] = 
[m ] = [0] ^ (I m ) x . We are now going to define an analog of I x that c an be used 
to generalize Fermat’s theorem. 


Definition. Let U (I,„) be the set of all those congruence classes in I m having 
an inverse; that is, [a] e JJ (I m ) if there is [s] e I,„ with [ ,v | [ a ] = [1], 

Recall that <p(m) = \{k eZ : 1 < k < m and ( k , m ) = 1 } | . 


Lemma 2.106. 

(i) 

U(I m ) = {[r]eI m :(r,rn)=l}. 

(ii) U (7 m ) is a multiplicative abelian group of order <p(m). 

Proof. 

(i) Let E = { [r] e I m : (r, m) = 1}. If [r] e E, then (r, m ) = 1, so there 
are integers .v and t with sr + tm = 1. Hence, sr = I mod in. Therefore, 
[jr] = [,y | [ /' | = [1], and so [r] e U (/,„). For the reverse inclusion, assume that 
[r] e U (/„,); that is, there is [s] € U (/„,) with [v] [r] = [1], But [s][r] = [ir] = 
[1], so that m \ (sr — 1); that is, there is an integer t with tm = sr — 1. By 
Exercise 1.51 on page 52, (r, m ) = 1, and so [r \ e E. 

(ii) By Exercise 1.53 on page 52, (r, m) = 1 = (r\ m ) implies (rr\ m) = 1. 

Hence, [r] and [r'] in U(I m ) imply [r][r'j = [rr'] e U(I m ), so that multipli- 
cation is an operation on U (I m ). Proposition 2.103 shows that multiplication is 
associative and commutative, and that [1] is the identity. By Proposition 2.104(i), 
the equation [r][x] = [1] can be solved for [x] e I m . That is, each [r] e U (/,„) 
has an inverse. Therefore, U ( I m ) is an abelian group and, by Proposition 1.39, 
its order is \ U (I m )| = 0(m). • 

If p is a prime, then f(p) = p — 1, and U ( l p ) = (I p ) x . 


Theorem 2.107 (Euler). If(r,m) = 1, then 

r <p(m) ^ i mo( j m 

Proof If G is a finite group of order n, then Corollary 2.84 to Lagrange’s the- 
orem gives x n = 1 for all x e G. Here, if [r] e U(I m ), then [r = [ 1 1. 
by Lemma 2.106. In congruence notation, this says that if ( r,m ) = 1, then 
r 0(»O = i m od m. • 
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Example 2.108. 

It is straightforward to see that 

U (Is) = { [1]' [3], [5], [7]} = V, 

for [3] 2 = [9] = [I], [5] 2 = [25] = [1], and [7] 2 = [49] = [I], 

Moreover, 

U (I to) = { [1]. [3], [7], [9]} = I 4) 
for [3] 4 = [81] = [1], while [3] 2 = [9] = [-1] ^ [1], ◄ 

Theorem 2.109 (Wilson’s Theorem). An in teger p is a prime if and only if 

(p — 1)! = — 1 mod p. 

Proof. Assume that p is a prime. If a \ , ai , . . . , a„ is a list of all the elements of 
a finite abelian group G, then the product a\ai ■ ■ ■ a n is the same as the product 
of all elements a with a 2 = 1, for any other element cancels against its inverse. 
Since p is prime, Exercise 1.81 on page 71 implies that I* has only one element 
of order 2, namely, [—1]. It follows that the product of all the elements in I*, 
namely, [(p — 1)!], is equal to [—1]; therefore, (p — 1)! = — 1 mod p. 

Conversely, if (m — 1)! = —1 mod in. then (in, (in — 1)!) = 1. If m is 
composite, then there is an integer a \ in with 1 < a < in — 1 . Now a \ a\ 
implies a \ (m — 1)!. Thus, a > 1 is a common divisor of m and (in — 1)!, a 
contradiction. Therefore, m is prime. • 

Remark. One can generalize Wilson’s theorem in the same way that Euler’s 
theorem generalizes Fermat’s theorem: replace U (I p ) by U (I„). For example, 
one can prove, for all m > 3, that JJ (l 2 m ) has exactly 3 elements of order 2, 
namely, [—1], [1 + 2 m_1 ], and [—(1 + 2 m-1 )]. It now follows that the product 
of all the odd numbers r, where 1 < r < 2'" is congruent to 1 mod 2 m , because 

(-1)(1 +2 m “ 1 )(-l - 2 m_1 ) = (1 + 2 m-1 ) 2 

= l+2 m + 2 2m - 2 = lrnod2 m . < 

The homomorphism n : Z — >• I m , defined by it : a i-> [a], is surjective, so 
that I,„ is equal to im tt . Thus, every element of I,„ has the form re (a) for some 
a e Z, and n(a) + nib) = n(a + b). This description of the additive group I m 
in terms of the additive group Z can be generalized to arbitrary, not necessarily 
abelian, groups. Suppose that / : G — > H is a surjective homomorphism be- 
tween groups G and H. Since / is surjective, each element of H has the form 
f(a) for some a e G, and the operation in H is given by f(a)f(b) = f(ab), 
where a, b e G. Now K = ker / is a normal subgroup of G, and we are going 
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to reconstruct H = imf (as well as a surjective homomorphism jt : G ^ H) 
from G and K alone. 

We begin by introducing an operation on the set 

S(G) 

of all nonempty subsets of a group G. If X, Y e <S(G), define 
XY = [xy : x € X and y e Y}. 

This multiplication is associative: X(YZ) is the set of all x(yz), where x e X, 
_v e Y, and z e Z, (XY)Z is the set of all such (xy)z, and these subsets are the 
same because associativity in G says that their elements are the same. 

An instance of this multiplication is the product of a one-point subset {a} 
and a subgroup H < G, which is the coset aH. 

As a second example, we show that if H is any subgroup of G, then 

HH= H. 

If h, Ir e H, then h h ' e H , because subgroups are closed under multiplication, 
and so HH C H . For the reverse inclusion, if /; e H, then h = hi e HH 
(because 1 e H), and so f/c HH. 

It is possible for two subsets X and Y in S ( G ) to commute even though 
their constituent elements do not commute. One example has just been given; 
take X = Y = H, where H is a nonabelian subgroup of G. Here is a more 
interesting example: let G = S 3 and K = ({ 12 3)). Now (1 2) does not 
commute with (12 3 ) € K, but we claim that (1 2 )K = ATI 2). In fact, let us 
prove the converse of Exercise 2.69 on page 166, so that H <1 G if and only if 
every left coset of H in G is a right coset. 


Lemma 2.110. If K is a normal subgroup of a group G, then 

bK = Kb 


for every b € G. 

Proof. Let bk e bK. Since K is normal, bkb~ l e K, say bkb~ l = k' e K, so 
that bk = ( bkb~ l )b = k'b e Kb, and so b K C Kb. For the reverse inclusion, 
let kb e Kb. Since K is normal, (b~ l )k(b~ 1 )~ i = b~ l kb e K, say b~ l kb = 
k" e K. Hence, kb = b(b~ l kb ) = bk" e bK and Kb C bK . Therefore, 
bK = Kb when K < G. • 


Here is a fundamental construction of a new group from a given group. 



176 


Groups I Ch. 2 


Theorem 2.111. Let G/K denote the family of all the cosets of a subgroup K 
of G. If K is a normal subgroup, then 

aKbK = abK 

for all a, b e G, and G/K is a group under this operation. 

Remark. The group G/K is called the quotient group G mod K ; when G is 
finite, its order \G / K \ is the index [G : K] = \G\/\K\ (presumably, this is the 
reason quotient groups are so called). A 

Proof The product of two cosets {aK)(bK) can also be viewed as the product 
of 4 elements in S(G). Hence, associativity in S ( G ) gives generalized associa- 
tivity: 

{aK)(bK) = a(Kb)K = a(bK)K = abKK = abK , 

for normality of K gives Kb = bK for all b e K . by Lemma 2.1 10, while KK = 
K because K is a subgroup. Thus, the product of two cosets of K is again a coset 
of K, and so an operation on G/K has been defined. Because multiplication in 
(5(G) is associative, equality X{YZ) = ( XY)Z holds, in particular, when X, Y, 
and Z are cosets of K, so that the operation on G/K is associative. The identity 
is the coset K = IK, for (1 K){bK) = 1 bK = bK, and the inverse of aK is 
a~ l K, for (a~ l K)(aK ) = a~ l aK = K. Therefore, G/K is a group. • 

Example 2.112. 

We show that the quotient group Z/{m) is precisely I„„ where (m) is the (cyclic) 
subgroup consisting of all the multiples of a positive integer m. Since Z is 
abelian, (m) is necessarily a normal subgroup. The sets Z/(m) and I m coin- 
cide because they are comprised of the same elements: the coset a + (m) is the 
congruence class [a]: 


a + (m) = {a + km \ k e Z} = [ a ]. 

The operations also coincide: addition in Z/(m) is given by 

(a + (m)) + (b + (m)) = {a + b) + (m); 

since a + (m) = [a], this last equation is just [a] + [b] = [a + b], which is the 
sum in I m . Therefore, I m is equal to the quotient group Z/(m). A 

We remind the reader of Lemma 2.80(i): if K is a subgroup of G, then two 
cosets aK and bK are equal if and only if b~ l a e K. In particular, if b = 1, 
then aK = K if and only if a e K. 

We can now prove the converse of Proposition 2.9 1 (ii). 
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Corollary 2.113. Every normal subgroup K <\ G is the kernel of some homo- 
morphism. 

Proof. Define the natural map it : G — > G/K by re (a ) = a K . With this 
notation, the formula aKbK = abK can be rewritten as jt(a)it{b) = Tt{ab)\ 
thus, it is a (surjective) homomorphism. Since K is the identity element in G/K . 

kern = {a e G : n(a) = K} = {a e G : aK = K} = K, 

by Lemma 2.80(i). • 

The next theorem shows that every homomorphism gives rise to an isomor- 
phism, and that quotient groups are merely constructions of homomorphic im- 
ages. It was E. Noether (1882-1935) who emphasized the fundamental impor- 
tance of this fact. 

Theorem 2.114 (First Isomorphism Theorem). If f : G -» H is a homo- 
morphism, then 

ker / < G and G / ker / = im /. 

In more detail, if ker / = K, then the function <p : G/K — > im / < H, given by 
<p\ aK i-> f(a), is an isomorphism. 

Proof. We have already seen, in Proposition 2.9 1 (ii), that K = ker / is a nor- 
mal subgroup of G. Now tp is well-defined: if aK = bK, then a = bk for some 
k e K, and so f(a) = f(bk ) = f{b)f(k) = f(b), because f(k) = 1. 

Let us now see that (p is a homomorphism. Since / is a homomorphism and 
<p(aK) = f (a), 

(p(aKbK) = (p{abK) = flab) = f(a)f(b) = <p(aK)cp(bK). 

It is clear that im <p < im /. For the reverse inclusion, note that if y e im /, 
then y = f{a) for some a e G, and so y = f(a ) = cp(aK). Thus, <p is 
surjective. 

Finally, we show that (p is injective. If <p{aK) = cp(bK ), then / (a ) = f(b). 
Hence, 1 = f(b)~ l f(a) = f(b~ l a), so that b~ l a e ker / = K. Thus, aK = 
bK, by Lemma 2.80(i), and so ip is injective. Therefore, cp: G/K —>■ im / is an 
isomorphism. • 

Remark. The following diagram describes the proof of the first isomorphism 
theorem, where n : G — > G/K is the natural map n : a aK. 


G 


f 


H ◄ 
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Here is a minor application of the first isomorphism theorem. For any group 
G, the identity function / : G — > G is a surjective homomorphism with ker / = 
{1}. By the first isomorphism theorem, we have 

G/{1} = G. 

Given any homomorphism / : G — »■ H , one should immediately ask for its 
kernel and its image; the first isomorphism theorem will then provide an iso- 
morphism G/ker/' = im /. Since there is no significant difference between 
isomorphic groups, the first isomorphism theorem also says that there is no sig- 
nificant difference between quotient groups and homomorphic images. 

Example 2.115. 

Let us revisit Example 2.88, which showed that any two cyclic groups of order m 
are isomorphic. If G = {a) is a cyclic group of order m, define a homomorphism 
/: Z — > G by f(n) = a" for all n e Z. Now / is surjective (because a is a 
generator of G), while ker / = { n e Z : a n = 1} = (m), by Lemma 2.53. 
The first isomorphism theorem gives an isomorphism Z /(in) = G. We have 
shown that every cyclic group of order m is isomorphic to Z /(in), and hence 
that any two cyclic groups of order m are isomorphic to each other. Of course. 
Example 2.112 shows that Z/(m) = I m , so that every cyclic group of order m is 
isomorphic to I m . < 

Example 2.116. 

What is the quotient group R/Z? Define /: R — »■ S 1 , where S 1 is the circle 
group, by 

/: x ^e 2nix . 

Now / is a homomorphism; that is, f(x + y) = by the addition 

formulas for sine and cosine. The map / is surjective, and ker / consists of 
all x e R for which e 27T,x = coslnx + i sin27rx = 1. But coslitx = 0 = 
sin 2nx forces x to be an integer; since 1 e ker /, we have ker / = Z. The first 
isomorphism theorem now gives 

R/Z = S 1 . ◄ 

A natural question is whether HK is a subgroup when both H and K are 
subgroups. In general, HK need not be a subgroup. For example, let G = 53 , let 
H = ((1 2)), and let K = ((1 3)). Then 

HK= {(1), (1 2), (1 3), (13 2)} 

is not a subgroup lest we contradict Lagrange’s theorem. Exercise 2.95 on 
page 188 gives a necessary condition describing when the product HK of sub- 
groups H and K is a subgroup. 
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Proposition 2.117. 

(i) If H and K are subgroups of a group G, and if one of diem is a normal 
subgroup, then HK is a subgroup of G; moreover, HK = KH in this case. 

(ii) If both H and K are normal subgroups, then HK is a normal subgroup. 
Proof. 

(i) Assume first that K <] G. We claim that HK = KH. If hk e HK, then 
k' = hkh~ l e K, because K < G, and 

hk = hkh~ l h = k'h e KH. 

Hence, HK C KH. For the reverse inclusion, write kh = hh~ l kh = hk" e HK. 
(Note that the same argument shows that HK = KH if H <\ G .) 

We now show that HK is a subgroup. Since 1 e H and 1 e K, we have 1 = 
lie HK\ if hk e HK, then (hk)- 1 = k- l h~ l e KH = HK ; if hk, hih e HK, 
\henhf { kh\ = k! e K and 


hkh\k\ = hhi(h^ ^kh\)k\ = (hhj)(k'k\) e HK. 

Therefore, HK is a subgroup of G. 

(ii) If g e G, then 

ghkg~ l = (ghg~ 1 )(gkg~ 1 ) e HK. 

Therefore, HK < G in this case. • 

Here is a useful counting result. 

Proposition 2.118 (Product Formula). If H and K are subgroups of a finite 
group G, then 

\HK\\H DK\ = \H\\K\, 
where HK = { hk : h e H and k e K). 

Proof. Define a function / : H x K — > HK by / : (h,k) i-> hk. Clearly, / is 
a surjection. It suffices to show, for every x e HK, that |/ _1 (x)| = \H Pi K |, 
where / -1 (x) = { (h , k) e H x K : f{h,k) = x] [because H x K is the disjoint 
union ij xeHK f~\x)]. 

We claim that if x = hk, then 

.r'(x) = {(hd,d~ l k) :d € HPK}. 


Each ( hd,d l k) e f 1 (x), for f(hd,d l k) = hdd l k = hk = x. For the 
reverse inclusion, let ( h 1 ,k ! ) e / -1 (x), so that h'k' = hk. Then h~ l h' = 
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kk ' 1 e H D K : call this element d. Then Id = hd and k! = d 1 k, and so 
(/;', k') lies in the right side. Therefore, 

\f~Hx)\ = \{(hd, d~ l k) :d e H HK}\ = \H HK\, 

because d (hd, d~ l k ) is a bijection. • 

The next two results are variants of the first isomorphism theorem. 

Theorem 2.119 (Second Isomorphism Theorem). If H and K are subgroups 
of a group G with H < G, then HK is a subgroup, H D K <\ K, and 

K/(H n K) = HK/H. 

Proof We begin by showing first that HK/H makes sense and then describing 
its elements. Since H <\ G, Proposition 2.117 shows that HK is a subgroup. 
Normality of H in HK follows from a more general fact: if H < S < G and if 
H is normal in G, then H is normal in S (if ghg~ l e H for every g e G, then, 
in particular, ghg~ l e H for every g e S). 

We now show that each coset xH e HK/H has the form kH for some 
k e K. Of course, xH = hkH, where h e H and k e K. But hk = k(k~ l hk) = 
kh' for some h' e H, so that hkH = kh' H = kH. 

It follows that the function / : K — > HK/H, given by f:k h>- kH , is 
surjective. Moreover, / is a homomorphism, for it is the restriction of the natural 
map Tt .G^>- G/H. Since kern = H, it follows that ker / = H D K, and so 
H n K is a normal subgroup of K . The first isomorphism theorem now gives 
K/(H n K) = HK/H. • 

The second isomorphism theorem gives the product formula in the special 
case when one of the subgroups is normal: if K/(H Pi K) = HK/H, then 
\K/(HnK)\ = \HK/H\, and so \HK\\H n K\ = |//| |AT|. 

Theorem 2.120 (Third Isomorphism Theorem). If H and K are normal 
subgroups of a group G with K < H, then H / K < G/K and 

(G/K)/(H/K) = G/H. 

Proof. Define /: G/K — > G/H by /: aK \-> aH. Note that / is a (well- 
defined) function, for if a' e G and a 1 K = aK, then a~ [ a r e K < H, and so 
aH = a’ H. It is easy to see that / is a surjective homomorphism. 

Now ker / = H/K, for aK = H if and only if a e H, and so H/K is a 
normal subgroup of G/K. Since / is surjective, the first isomorphism theorem 
gives ( G/K)/(H/K ) = G/H. • 
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The third isomorphism theorem is easy to remember: the K’s in the fraction 
( G/K)/(H/K ) can be canceled. One can better appreciate the first isomorphism 
theorem after having proved the third one. The quotient group ( G/K)/(H/K ) 
consists of cosets (of H/K) whose representatives are themselves cosets (of 
G/K). A direct proof of the third isomorphism theorem could be nasty. 

The next result, which describes the subgroups of a quotient group G/K, can 
be regarded as a fourth isomorphism theorem. Recall that a function / : X Y 
sets up a correspondence, using direct and inverse images, between subsets of 
X and subsets of Y. We now adapt this viewpoint to the special case when 
f : G ^ H is a homomorphism. 

If G is a group and K < G, let Sub(G; K) denote the family of all those 
subgroups S of G containing K, and let Sub(G//C' ) denote the family of all the 
subgroups of G/K. 


Proposition 2.121 (Correspondence Theorem). If G is a group and K <]G, 
then S \->- S/K is a bijection Sub(G; K) -»■ Sub(G/X). Denoting S/K by S*, 
we have 

(i) T < S < G in Sub(G; K) if and only if'T* < S* in Sub(G / K ), in which 
case [S:T] = [5* : T*\; 

(ii) T <} S in Sub(G; K) if and only ifT* <1 S* in Sub(G / K), in which case 
S/T = S*/T*. 

Proof Let O: Sub(G; K) — »• Sub(G/TO denote the function S S/K 
(it is routine to check that if S is subgroup of G containing K, then S/K is a 
subgroup of G/K). 

To see that O is injective, we begin by showing that if K < S < G, then 
7T -1 7r(S) = S, where n : G — > G/K is the natural map. As always, S C 
7T -1 7r(S), by Proposition 2.14(iii). For the reverse inclusion, let a e 7r -1 7r(S), 
so that n(a) = n(s) for some s e S. It follows that as~ l e kern = K, so that 
a = sk for some k € K. But K < S, and so a = sk e S, as desired. 

Assume now that n(S) = n(S'), where S and S' are subgroups of G con- 
taining K (note that n{S) = S/K). Then jr -1 7r(S) = n~ l n(S'), as we have just 
proved in the preceding paragraph, and so S = S' - , hence, <t> is injective. 

To see that <t> is surjective, let U be a subgroup of G/K. By Example 2.90(iv) 
n~ l (U) is a subgroup of G containing K = n~ l ({1}), and n(n~ l (U)) = U, by 
Proposition 2.14(h). 

Proposition 2.14(i) shows that T < S < G implies T /K = n(T) < n(S) = 
S/K. Conversely, assume that T/K < S/K. If t sT, then tK e T/K < S/K 
and so tK = sK for some s e S. Hence, t = sk for some k e K < S, and so 
t e S. 
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In the important special case when G is finite, we prove [5 : T] = [5* : T*] 
as follows. 


[S* :T*] = \S*\/\T*\ 

= \S/K\/\T/K\ 

= (\S\/\K\) / (\T\/\K\) 

= |5|/|r| 

= [5 : T], 

To prove that [5 : T] = [5* : T*] in the general case, it suffices to show that 
there is a bijection from the family of all cosets of the form sT, where s e S, 
and the family of all cosets of the form s*T*, where s* e S*, and the reader may 
check that s T it(s)T* is such a bijection. 

The third isomorphism theorem shows that if T <] S. then T / K <] S / K and 
( S/K)/(T/K ) = S/T; that is, S*/T* = S/T. It remains to show that T < S if 
T* < 5*; that is, if t € T and s e S, then sts~ l e T . Now 

7t(sts ~ l ) = n(s)7t(t)7T(s)~ l e n(s)T*jt(s)~ l = T* , 
so that sts~ l e = T. • 

When dealing with quotient groups, one usually says, without mentioning 
the correspondence theorem explicitly, that every subgroup of G/K has the form 
S/K for a unique subgroup S < G containing K. For example, if G is a finite 
/ugroup, that is, if |G| = p n for some prime p, then Theorem 2.146 says that 
if G 7 ^ {1}, then the center is nontrivial; that is, Z(G) 7 ^ {1}. Hence, if G is 
not abelian, then Z(G/Z(G)) 7 ^ {1}. An important role in the investigation of 
finite p-groups is played by Z 2 (G) which is, by definition, the inverse image of 
Z(G/Z(G)). Note that Z 2 (G) < G, by the correspondence theorem, 

Proposition 2.122. If G is a finite abelian group, then G has a subgroup of 
order d for every divisor d of |G|. In particular, if p is a prime divisor of \G\, 
then G contains an element of order p. ( Compare Proposition 2.97.) 

Proof We begin by proving, by induction on n = |G|, that for every prime 
divisor p of |G|, there is an element of order p in G. The base step n = 1 is 
true, for there are no prime divisors of 1 . For the inductive step, choose a e G 
of order k > 1. If p \ k, say k = pi, then Exercise 2.34 on page 143 says 
that a ' has order p. If p \ k, consider the cyclic subgroup H = {a). Now 
H <] G, because G is abelian, and so the quotient group G/H exists. Note that 
| G/H | = n/k is divisible by p , and so the inductive hypothesis gives an element 
bH e G/H of order p. If b has order m, then ( bH) m = b m H = H in G/H, 
and so Lemma 2.53 gives p \ m. We have returned to the first case. 
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We prove the general result by induction on d > 1. The base step d = 1 
is obviously true, and so we may assume that d > 1; that is, we may assume 
that d has a prime divisor, say, p. By induction, G contains a subgroup H of 
order p. Since G is abelian, H <i G. and so the quotient group G/H is defined. 
Moreover, \G/H\ = \G\/p, so that (d/p) \ \G/H\. The inductive hypothesis 
gives a subgroup S* < G/H with .S'* = d / p. By the correspondence theorem, 
there is a subgroup S (where H < S < G) with S* = S/H. Therefore, |S| = 
p\S*\ = p ■ (d/p) = d. • 

A theorem of Cauchy, Theorem 2.145, says that if p is a prime divisor of 
|G|, where G is any finite, not necessarily abelian, group, then G has an element 
of order p. 

Here is another construction of a new group from two given groups. 

Definition. If H and K are groups, then their direct product, denoted by H x K , 
is the set of all ordered pairs (/; , k) with h e H and k e K equipped with the 
operation 

(/;, k){h' , k') = (hh' , kk'). 

It is routine to check that H x K is a group [the identity is (1,1) and 
(h,k)~ l = (/; ~ 1 , A: ~ 1 ) | . Note that H x K is abelian if and only if both H 
and K are abelian. 

Example 2.123. 

The four- group V is isomorphic to INx 1 2 . The reader may check that the function 
/ : V — > 1 2 x 1 2 , defined by 

/: (1) m* ([0], [0]), 

/: (1 2) (3 4) m* ([1], [0]), 

/: (1 3)(2 4) k>- ([0], [1]), 

/: (1 4) (2 3) ([!],[!]), 


is an isomorphism. ◄ 

We now apply the first isomorphism theorem to direct products. 

Proposition 2.124. Let G and G' be groups, and let K <\ G and K' <] G' be 
normal subgroups. Then K x K' is a normal subgroup of G x G' , and there is 
an isomorphism 


(G x G')/{K x K') = ( G/K ) x ( G'/K '). 
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Proof. Let n : G — > G/K and 7t': G' G'/K 1 be the natural maps. The 
reader may check that / :GxG'-> (G/K) x (G'/K'), given by 

/ : (g, g') ^ (tt(g), tt'(g')) = ( gK , g'K') 

is a surjective homomorphism with ker / = K x K'. The first isomorphism 
theorem now gives the desired isomorphism. • 

Here is a characterization of direct products. 

Proposition 2.125. If G is a group containing normal subgroups H and K with 
H HK = { 1} and HK = G, then G = H x K. 

Proof We show first that if g e G, then the factorization g = hk, where 
h e H and k e K, is unique. If hk = h'k' , then h'~ l h = k'k~ l e H n K = {1}. 
Therefore, h' = h and k' = k. We may now define a function <p\ G — > H x K 
by (pig) = ( h,k ), where g = hk, h e H, and k e K. To see whether <p 
is a homomorphism, let g' = h'k' , so that gg' = hkh'k ' = hh'kk' . Hence, 
(p(gg') = (p(hkh'k'), which is not in the proper form for evaluation. If we knew 
that if h e PI and k e K, then hk = kh, then we could continue: 

cp (hkh'k') = (p (hh'kk 1 ) 

= ( hh', kk!) 

= (h, k)(h ' , k!) 

= <p(g)<p(g')- 

Let h e H and k e K . Since K is a normal subgroup, (hkh~ l )k~ l e K ; 
since H is a normal subgroup, h(kh~ l k~ l ) e H. But H n K = {1}, so that 
hkh~ l k~ l = 1 and hk = kh. Finally, we show that the homomorphism (p is an 
isomorphism. If (h, k) e H x K, then the element g e G defined by g = hk 
satisfies xp(g) = (h, k)\ hence xp is surjective. If <p(g) = (1, 1), then g = 1, so 
that ker <p = 1 and <p is injective. Therefore, <p is an isomorphism. • 

All the hypotheses in Proposition 2.125 are needed. For example, let G = 
S 3 , H = ((1 2 3)), and K = {(1 2)). Now S 3 = HK, {1} = HDK, and H <S 3 , 
but K is not a normal subgroup. It is not true that S 3 = H x K, for S 3 is not 
abelian, while the direct product H x K of abelian groups is abelian. 

Theorem 2.126. Ifm and n are relatively prime, then 

Imn = I/m X I H . 

Proof. If a e Z, denote its congruence class in I,„ by [a \ m . The reader can 
show that the function / : Z — » I m x I„, given by a (\a [a ]„ ), is a homo- 
morphism. We claim that ker / = (mn). Clearly, (mn) < ker/. For the reverse 
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inclusion, if a e ker /, then [a\ m = [0]„, and [a ]„ = [0] M ; that is, a = 0 mod in 
and a = 0 mod n: that is, m \ a and n \ a. Since m and n are relatively prime. 
Exercise 1.55 on page 52 gives inn \ a, and so a e {inn}, that is, ker / < {inn}, 
and so ker / = {inn}. 

We now show that / is surjective. If ([o] m , [/?]„) e I m x I,„ is there x e Z 
with f{x) = ([x] m , [x]„) = ([a] m , [b] n )\ that is, is there x e Z with x = 
a mod m and x = b mod n ? Since m and n are relatively prime, the Chinese 
Remainder Theorem provides a solution x. The first isomorphism theorem now 
gives I mn = Z/ {inn) = l m x I„. • 

For example, it follows that Ig = 1 2 x I3. Note that there is no isomorphism 
if m and n are not relatively prime. Example 2.123 shows that V = 1 2 x 1 2, 
which is not isomorphic to I4 because V has no element of order 4. 

In light of Proposition 2.72, we may say that an element a e G has order n 
if {a) = I„. Theorem 2.126 can now be interpreted as saying that if a and b are 
commuting elements having relatively prime orders m and n, then ab has order 
inn . Let us give a direct proof of this result. 

Proposition 2.127. Let G be a group, and let a, b e G be commuting elements 
of orders m and n, respectively. If {m, n) = 1, then ab has order inn. 

Proof. Since a and b commute, we have (ab)' = a r b r for all r, so that ( ab)' nn = 
a mnpmn — jp suffices to prove that if (ab) k = 1, then inn \ k. If 1 = ( ab) k = 
a k b k , then a k = b~ k . Since a has order m, we have 1 = a" lk = b~ mk . Since b 
has order n. Lemma 2.53 gives n \ ink. As (m, 11 ) = 1, however, Corollary 1.37 
gives n \ k: a similar argument gives m \ k. Finally, Exercise 1.55 on page 52 
shows that inn \ k. Therefore, inn < k, and inn is the order of ab. • 

Here is a number-theoretic application of direct products. 

Corollary 2.128. If (in, 11 ) = 1, then <p(mn ) = <p(m)4>(n), where f is the Euler 
(p-function. 

Proof. 19 As in the proof of Theorem 2.126, we denote the elements of I,„ by 
[aim, and we recall that / : I mn I m x I„, defined by [a] mn i-> ([c/] m , [«]„), is 
an isomorphism. Now Lemma 2.106 says that | U (I m )| = (pirn), where U (I m ) = 
{[r] G I m : (r, m ) =1}. Thus, if we prove that f(U (I mn )) = U (I m ) x U (I„), 
then the result will follow: 

<Kmn) = \U(l mn )\ = \f(U(I mn ))\ 

= |t/ (Im) x U(I n )\ = \U(l m )\ ■ |t/(I„)| = <P(m)<P(n). 

19 See Exercise 3.53(iii) on page 249 for a less computational proof. 
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We claim that f(U( !,„„)) = U( l m ) x U( I„). If [a] mn £ U (l mn ), then 

\fl\mn\b\mn = [l]mn for Some \b] mn £ I mru Ontl 

f ([ab] mn ) = dab] m , [ab] n ) = ( [ u ] /?; [ /^ ] ■ [ulntfiL;) 

= ([a]*, [a] n )m m , [HO = ([1]*, [1]„). 

Hence, [l] m = [a] m [b] m and [1]„ = [a] n [b]„, so that f([a] mn ) = ([«],„, [«]«) € 
t/(I m ) x and f(U(l mn )) < U( I m ) x 

For the reverse inclusion, if f([c] mn ) = ([c],„, [c]„) £ U (I m ) x U (I„), then 
we must show that [c] mn e t/ There is [d],„ € I m with [c] m [d] m = [l] m , 
and there is [e] u e I„ with [c]„ [e]„ = [1]„. Since / is surjective, there is b e Z 
with ([/?]„,, [fe]„) = ([J] m , [e] n ), so that 

/([l]m«) = ([1 im, [l]n) = (WmMin, [c]»[Hi) = /([c]«mMm/i). 

Since / is an injection, [l] m „ = [c] m „[fe] m „ and [c],„„ £ f/ • 

Definition. If H\, .... H n are groups, then their direct product 

H\ x • • • x H n 

is the set of all n-tuples (hi, , h„), where h, £ //, for all i, with coordinate- 
wise multiplication: 

(hi,..., /?«)(/? | , . . . , /O = (h\h' v .... h n h' n ). 

The basis theorem, Theorem 6.11, says that every finite abelian group is a 
direct product of cyclic groups. 

Here is a variant of Proposition 2.73. 

Proposition 2.129. If G is a finite abelian group having a unique subgroup of 
order p for every prime divisor p of\G\, then G is cyclic. 

Proof. Choose a £ G of largest order, say, n. If p is a prime divisor of |G|, let 
C = C p be the unique subgroup of G having order p: the subgroup C must be 
cyclic, say C = (c). We show that p \ n by showing that c £ (a) (and hence 
C < (a)). If (p,n) = 1, then ca has order pn > n, by Proposition 2.127, 
contradicting a being an element of largest order. If p \ n, say, n = pq, then a q 
has order p, and hence it lies in the unique subgroup (c) of order p. Thus, a q = 
c' for some i. Now (;. p) = 1, so there are integers u and v with 1 = ui + vp\ 
hence, c = c lll+vp = c m c vp = c m . Therefore, a qu = c m = c, so that c £ (a), as 
desired. It follows that (a) contains every element x £ G with x p = 1 for some 
prime p. 
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If (a) = G, we are finished. Therefore, we may assume that there is b e G 
with b £ {a}. Now /;l ( ' = 1 e (a): let k he the smallest positive integer with 
b k e (a): 

b k =a q . 

Note that k | |G| because k is the order of b(a) in G/(a). Of course, k ^ 1, 
and so there is a factorization k = pm, where p is prime. There are now two 
possibilities. If p \ q, then q = pu and 

b pm = b k = a q = a pu . 

Hence, (b m a~ u ) p = 1, and so b m a~ u e (a). Thus, b m e (a), and this contra- 
dicts k being the smallest exponent with this property. The second possibility is 
that p \ q, in which case ( p.q) = 1 . There are integers s and t with 1 = sp + tq, 
and so 

a = a sp+tq = a sp a tq = a sp b pmt = (a s b mt ) p . 

Therefore, a = x p , where x = a s b mt , and Exercise 2.35 on page 143, which 
applies because p \ n, says that the order of x is greater than that of a, a contra- 
diction. We conclude that G = {a). • 

The proposition is false for nonabelian groups, for the group of quaternions 
Q is a counterexample; it is a noncyclic group of order 8 having a unique sub- 
group of order 2. 


Exercises 

2.85 Prove that U (1 9 ) = Ig and U (1 15 ) = I 4 x I 2 . 

2.86 (i) If H and K are groups, prove, without using the ti rst isomorphism the- 

orem, that H* = {(/?, 1) : h e H} and K* = {(l,fc) : k e K] are 
normal subgroups of H x K with H = H* and K = K*. 

(ii) Prove that / : H -»• (H x K)/K*, defined by f(h) = f h, 1 )K*, is an 
isomorphism without using the fi rst isomorphism theorem. 

(iii) Use the fi rst isomorphism theorem to prove that K* < (H x K ) and that 
(H x K)/K* = H. 

*2.87 If G is a group and G/Z(G) is cyclic, where Z(G) denotes the center of G, prove 
that G is abelian; that is, G = Z(G). Conclude that if G is not abelian, then 
G/Z(G) is never cyclic. 

*2.88 Let G be a fi nite group, let p be a prime, and let H be a normal subgroup of G. 

Prove that if both \H\ and \G/H\ are powers of p, then |G| is apower of p. 

*2.89 Call a group G fi nitely generated if there is a fi nite subset X C G with G = ( X ). 
Prove that every subgroup S of a fi nitely generated abelian group G is itself fi nitely 
generated. (This can be false if G is not abelian.) 
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*2.90 (i) Let n : G —*■ H be a surjective homomorphism with ker7r = T. Let 

H = ( X ), and, for each x € X, choose an element g x € G with n(g x ) = 
x. Prove that G is generated by T U {g Y : x e X}. 

(ii) Let G be a group and let T < G . If both T and G/T are h nitely generated, 
prove that G is fi nitely generated. 

*2.91 Let A, B and C be groups, and let a, fi and y be homomorphisms with y o a = fi. 

A — ^ B 

y 

C 



If a is surjective, prove that ker y = a(ker fi). 

*2.92 Let A and B be groups, let A' < A and B' < B be normal subgroups, and let 
a : A —*■ B be a homomorphism with a(A') C B'. 

(i) Prove that there is a (well-defi ned) homomorphism cr* : A/ A' B / B' 

given by a* : a A' i— a(a)B' . 

(ii) Prove that if ce is surjective, then a* is surjective. 

(iii) Prove that if a is injective, then a* is injective. 

2.93 (i) Prove that Q/Z(Q) = V, where Q is the group of quaternions and V is 

the four-group. Conclude that the quotient of a nonabelian group by its 
center can be abelian. 

(ii) Prove that Q has no subgroup isomorphic to V. Conclude that the quo- 
tient Q/Z(Q) is not isomorphic to a subgroup of Q. 

2.94 Let G be a fi nite group with K < G. If (\K\, [G : K |) = 1, prove that K is the 
unique subgroup of G having order \K\. 

*2.95 Let H and K be subgroups of a group G. 

(i) Prove that H K is a subgroup of G if and only if H K = K H. In particu- 
lar, the condition holds if hk = kh for all h e H and k e K. 

(ii) If HK = KH and H G K = {1}, prove that HK = H x K. 

2.96 Let G be a group and regard G x G as the direct product of G with itself. If the 
multiplication /x : G x G —*■ Gisa group homomorphism, prove that G must be 
abelian. 

*2.97 Generalize Theorem 2.126 as follows. Let G be a fi nite (additive) abelian group of 
order mn, where (m, n) = 1. Deft ne 

Gm = {g £ G : order (g) | m } and G„ = {/? € G : order (h) \ n }. 

(i) Prove that G m and G„ are subgroups with G m (T G n = {0}. 

(ii) Prove that G = G m + G n = {g + h : g e G m and h e G„}. 

(iii) Prove that G = G m x G n . 

*2.98 (i) Generalize Theorem 2.126 by proving that if the prime factorization of 

an integer m is m = p e ^ ■ ■ ■ p n " , then 




Group Actions 189 


(ii) Generalize Corollary 2.128 by proving that if the prime factorization of 
an integer m is m = p\ 1 • • • p e „ , then 

t/(I«) = U(I p ei) X ••• X U(I p en). 

2.99 Let p be an odd prime, and assume that a; = i mod p for 1 < i < p — 1. Prove 
that there exist i ^ j with ia ,• = /'ay mod p. 

2.100 (i) If p is a prime, prove that 4>(p k ) = p k ( 1 — ■!). 

(ii) If the distinct prime divisors of a positive integer h are pi, P2, ■ ■ • , Pn , 
prove that 

^ W = / I (l- J- )( i_ J_)...(i_ J_). 

2.101 If G is a group and x, y G G, defi ne their commutator to be xyx - 1 y ~ 1 , and defi ne 
the commutator subgroup G' to be the subgroup generated by all the commutators 
(the product of two commutators need not be a commutator). 

(i) Prove that G' <\G. 

(ii) Prove that G / G' is abelian. 

(iii) If tp : G — »• A is a homomorphism, where A is an abelian group, prove 
that G' < ker^>. Conversely, if G' < ker^, prove that im <p is abelian. 

(iv) If G' < H <G, prove that H <\G. 


2.7 Group Actions 

Groups of permutations led us to abstract groups; the next result, due to A. Cay- 
ley (1821-1895), shows that abstract groups are not so far removed from permu- 
tations. 

Theorem 2.130 (Cayley). Every group G is ( isomorphic to) a subgroup of 
the symmetric group So- In particular, if \G\ = n, then G is isomorphic to a 
subgroup of S n . 

Proof For each a e G, define “translation” x a : G G by r„ (x ) = ax for 
every x e G (if a f 1, then x a is not a homomorphism). For a, b e G, 
(x a °x b){x) = x a (x b {x)) = x a (bx) = a(bx) = ( ab)x = x a b{x), by associativity, 
so that 

tcGb = tab- 

It follows that each x a is a bijection, for its inverse is x a i : 

t a r a -i t aa -i X\ 1 Gi 


and so r fl e Sg. 
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Define <p\ G — > Sq by tp(a) = x a . Rewriting, 

( p(a)<p{b ) = x atb = Xab = (p(ab), 

so that <p is a homomorphism. Finally, <p is an injection. If <p(a) = (p(b). then 
x a = r b, and hence x a (x ) = x/, (x ) for all x € G; in particular, when x = 1, this 
gives a = b, as desired. 

The last statement follows from Exercise 2.56 on page 165, which says that 
if X is a set with |Xj = n, then Sx = S„. • 

The reader may note, in the proof of Cayley’s theorem, that the permutation 
x a is just the ath row of the multiplication table of G. 

To tell the truth, Cayley’s theorem itself is only mildly interesting. However, 
the identical proof works in a larger setting that is more useful. 

Theorem 2.131 (Representation on Cosets). Let G be a group, and let H 
be a subgroup of G having finite index n. Then there exists a homomorphism 
cp : G -x S n with ker <p < H. 

Proof. Even though H may not be a normal subgroup, we still denote the fam- 
ily of all the cosets of H in G by G/H. 

For each a e G, define “translation” x a : G/H — »■ G/H by x a (xH) = axH 
for every x € G. For a,b e G, 

(r a o Xb){xH) = x a {x b(xH)) = x a (bxH ) = a(bxH) = (ab)xH = x a b(aH), 
by associativity, so that 

XaXb = X a b- 

It follows that each x a is a bijection, for its inverse is x a ~w 

T/T a -i tt aa - 1 D 1 g> 

and so x a e Sg/h ■ Define cp: G -> Sg/h by < p(a) = x a . Rewriting, 

<p(a)(p(b) = x a x b = Xab = <p(ab), 

so that <p is a homomorphism. Finally, if a e ker<p, then (p(a) = \g/h , so that 
x a (xH) = xH for all x e G; in particular, when x = 1, this gives aH = H, and 
a e H, by Femma 2.80(i). The result follows from Exercise 2.56 on page 165, 
for | G/H | = n, and so Sg/h = S„. • 

When H = {1}, this is the Cayley theorem. 

We are now going to classify all groups of order up to 7. By Example 2.88, 
every group of prime order p is isomorphic to I p , and so, up to isomorphism, 
there is just one group of order p. Of the possible orders through 7, four of them, 
2, 3, 5, and 7, are primes, and so we need look only at orders 4 and 6. 
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Proposition 2.132. Every group G of order 4 is isomorphic to either I4 or the 
four-group V. Moreover, I4 and V are not isomorphic. 

Proof By Lagrange’s theorem, every element in G, other than 1, has order 
either 2 or 4. If there is an element of order 4, then G is cyclic. Otherwise, 
x 2 = 1 for all x e G, so that Exercise 2.38 on page 143 shows that G is abelian. 

If distinct elements x and y in G are chosen, neither being 1 , then one quickly 
checks that xy / { I , x . y } ; hence, 

G = {1, x, y, xy}. 

It is easy to see that the bijection /: G — »• V, defined by /( 1) = 1, f(x) = 
(1 2) (3 4), f(y) = (1 3) (2 4), and /(xy) = (1 4) (2 3), is an isomorphism, for 
the product of any two elements of order 2 here is the other element of order 2. 
We have already seen, in Example 2.89, that I4 ^ V. • 

Proposition 2.133. If G is a group of order 6, then G is isomorphic to either 
Ig or S 3 . 20 Moreover, Ig and S 3 are not isomorphic. 

Proof. By Lagrange’s theorem, the only possible orders of nonidentity ele- 
ments are 2, 3, and 6. Of course, G = Ig if G has an element of order 6. 
Now Exercise 2.40 on page 144 shows that G must contain an element of order 
2, say, t. Let T = (t). 

Since [G : T] = 3, the representation on the cosets of T is a homomorphism 
p: G —*■ Sq/t = S 3 with kerp < T. Thus, kerp = {1} orkerp = T. In the 
first case, p is an injection, and hence it is an isomorphism, for |G| = 6 = IS3I. 
In the second case, kerp = T, so that T <] G and the quotient group G/T is 
defined. Now G/T is cyclic, for | G/T | = 3, so there is a e G with G/T = 
{T,aT,a 2 T}. Moreover, p, is the permutation 

_/T aT a 2 T\ 

Pt ~ \tT taT ta 2 T J 

20 Cayley states this proposition in an article he wrote in 1854 . However, in 1878 , in the 
American Journal of Mathematics, he wrote, ‘The general problem is to fi nd all groups of a 
given order n; ... if n = 6, there are three groups; a group 

1, a, or, cc 3 , cc 4 , a 5 (ct 6 = 1), 

and two more groups 

1, yS, p 2 ,a, af,af 2 (cr = l,p 3 = l), 

viz., in the fi rst of these aft — fta while in the other of them, we have aft = ffa, aft 2 = fta.” 
Cayley’s list is Ig, I2 x I3, and 53. Of course, 13x13 = Ig. Even Homer nods. 
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Since t € T = kerp, we have p, the identity. In particular, aT = p t (aT) = 
taT , so that a~ x ta e T = { 1, t}, by Lemma 2.80(i). But a~ l ta ^ 1, so that 
a~ l ta = f; that is, fa = at. Now a has order 3 or order 6 (for a ^ 1 and 
a 2 7 ^ 1). In either case, G has an element of order 6 : if a has order 3, then at 
has order 6 , by Proposition 2.127 (alternatively, just note that (at ) 6 = 1 and that 
(at)' / 1 for i < 6 ). Therefore, G is cyclic of order 6 , and G = 1 6 - 

It is clear that Ig and S 3 are not isomorphic, for one is abelian and the other 
is not. • 

One consequence of this result is another proof that Ig = I 2 x 1 3 (see Theo- 
rem 2.126). 

Classifying groups of order 8 is more difficult, for we have not yet developed 
enough theory (see my book. Advanced Modern Algebra, Theorem 5.83). It turns 
out that there are only 5 nonisomorphic groups of order 8 : three are abelian: 
Ig; I4 x I2; I2 x 1 2 x 1 2 ; two are nonabelian: D&; Q. 


Order of Group 

Number of Groups 

2 

1 

4 

2 

8 

5 

16 

14 

32 

51 

64 

267 

128 

2, 328 

256 

56, 092 

512 

10,494,213 

1024 

49,487,365,422 


Table 2.4. 


One can continue this discussion for larger orders, but things soon get out 
of hand, as Table 2.4 shows (the calculation of the numbers in the table is very 
sophisticated). The number of nonisomorphic groups having order < 2000 was 
found by E. O’Brien, but focusing on the numbers in Table 2.4 is more dramatic. 
A. Mclver and P. M. Neumann proved, for large n , that the number of noniso- 
morphic groups of order n is about l/,+ 2 , where pin) is the largest exponent 
occurring in the prime factorization of n. Obviously, making a telephone direc- 
tory of groups is not the way to study them. 

Groups arose by abstracting the fundamental properties enjoyed by permu- 
tations. But there is an important feature of permutations that the axioms do 
not mention: permutations are functions. We shall see that there are interesting 
consequences when this feature is restored. 
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Let us agree on some notation before giving the next definition. A function 
of two variables, a : X x Y —>■ Z, can be regarded as a one-parameter family 
of functions of one variable: each x e X gives a function a x : Y — > Z, namely, 
a* 00 = a(x,y). 

Definition. If X is a set and G is a group, then G acts on X 21 if there exists a 
function a : G x X — > X, called an action, such that 

(i) for g, h e G, a g o a/, = a g h', 

(ii) u\ = lx, the identity function. 

If G acts on X, we shall usually write gx instead of a g (x). In this notation, 
axiom (i) reads gihx) = ( gh)x . 

Of course, every subgroup G < Sx acts on X. More generally, actions of a 
group G on a set X correspond to homomorphisms G -* Sx- 

Proposition 2.134. If or. G x X — > X is an action of a group G on a set X, 
then g \->- a g defines a homomorphism G — »■ Sx- Conversely, if B : G — »• Sx is 
a homomorphism, then f : G x X — > X, defined by fig, x) = B(g)(x), is an 
action. 

Proof If a : G x X — > X is an action, then we claim that each a,, is a permu- 
tation of X. Indeed, its inverse is a g -i, because <x g a g - 1 = a - 1 = a\ = lx- It 
follows that A: G — > Sx, defined by A(g) = a g , is a function with the stated 
target. That A is a homomorphism follows from axiom (i): 

Aigh) = a gh =oi g o a h = A(g) o A(h). 

Conversely, the function f:GxX — »• X, defined by a homomorphism 
B : G Sx as fig. x) = B(g)(x), is an action. According to our notational 
agreement, fi g = Big). Thus, axiom (i) merely says that Big) o B(h) = Bigh), 
which is true because B is a homomorphism, while axiom (ii), 5(1) = lx, holds 
because every homomorphism takes the identity to the identity. • 

Cayley’s theorem says that a group G acts on itself by (left) translation, and 
its generalization, the representation on cosets (Theorem 2.131), shows that G 
also acts on the family of cosets of a subgroup H by (left) translation. 

Example 2.135. 

We show that G acts on itself by conjugation', that is, for each g e G, define 
a g : G G by 

<Xg(x) = gxg~ l . 

21 If G acts on X, then one often calls X a G-set. 
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To verify axiom (i), note that for each x e G, 


(a g oa h )(x) = a g (a h (x )) 

= a g (hxh~ l ) 

= g(hxh~ l )g~ l 
= (gh)x(gh)~ l 
= oc g h(x). 

Therefore, a g o «/, = a g h- 

To prove axiom (ii), note that for each x e G, 

a i(x) = lxl -1 = x, 


and so = 1 g- < 

The following two definitions are fundamental. 

Definition. If G acts on X and x e X, then the orbit of x, denoted by O(x), is 
the subset of X : 

0(x) = {gx:geG} c X; 

the stabilizer of x, denoted by G x , is the subgroup of G: 

G x = {g e G : gx = x} < G. 

It is easy to check that the stabilizer G x of a point x is a subgroup of G. 

Let us find orbits and stabilizers in the examples above. 


Example 2.136. 


(i) Cayley’s theorem says that G acts on itself by translations: x a : x >—? ax. 
If x e G, then the orbit O(x) = G, for if g e G, then g = (gx -1 )x. The 
stabilizer G x of x is {1}, for if x = r„ (x) = ax, then a = 1. One says that 
G acts transitively on X when there is some x e X with O (x ) = X. 

(ii) When G acts on G/H (the family of cosets of a subgroup H) by transla- 
tions x a : x H h-> axH, then the orbit O(xH) = G/H, for if g e G and 
a = gx~ [ , then x a \ xH \-> gH. Thus, G acts transitively on G/H. The 
stabilizer G x h of xH is xHx~ x , for axH = xH if and only if x _1 ax e H 
if and only if a e xHx~ l . A 
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Figure 2.18 Dihedral Group 


Example 2.137. 

Let X = the vertices {uo, v\, u 2 ■ ?’ 3 } of a square, and let G be the dihedral 
group Z)g acting on X, as in Figure 2.18 (for clarity, the vertices in the figure are 
labeled 0, 1, 2, 3 instead of vq, v\ , v 2 , V3). 

G = {rotations : (1), (u 0 G v 2 P 3 ), Oo P 2 )(vi V 3 ), (vo V 3 v 2 ui); 

reflections : (m u 3 ), (u 0 V 2 ), (vo vi)(v 2 V 3 ), (i>o V 3 )(t>i u 2 )}. 

For each vertex w, e X, there is some g e G with guo = u/ ; therefore, 0(v 0 ) = 
X and Ds acts transitively. 

What is the stabilizer G Vo of t>o? Aside from the identity, there is only one 
g e t)g fixing t’o, namely, g = (t>i U 3 ); therefore G vo is a subgroup of order 2. 
(This example can be generalized to the dihedral group G 2 „ acting on a regular 
«-gon.) ◄ 

Example 2.138. 

Let a group G act on itself by conjugation. If x e G, then 

0(x) = [y € G : y = axa~ l for some a e G}; 

0(x) is called the conjugacy class of x, and it is often denoted by x G . For 
example, Proposition 2.33 shows that if a e S„ , then the conjugacy class of a 
consists of all the permutations in S n having the same cycle structure as a. 

If x e G, then the stabilizer G x of x is 

Cg Ox) = {g e G : gxg~' = x}. 

This subgroup of G, consisting of all g e G that commute with x, is called the 
centralizer of x in G. ◄ 
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Example 2.139. 

Let X = {1, 2, ...,«}, let a e S„, and regard the cyclic group G = (er) as acting 
on X. If i e X, then 

Oli ) = { cr k (i ) : k e Z}. 

Let a — f>\ ■■■ pita) be the complete factorization of o , and let i = /'o be moved 
by a. If the cycle involving io is fi = do i i ... then the proof of 

Theorem 2.26 shows that 4 = rr do) for all k < r — 1. Therefore, 

0{i) = {io. it, ••• , ir-l}, 

where i = io- It follows that \Od)\ = r. The stabilizer Gi of a symbol l is G if 
a fixes i, and it is a proper subgroup of G if a moves t. < 

A group G acting on a set X gives an equivalence relation on X. Define 

x = y if there exists g e G with y = gx. 

If x € X, then lx = x, where 1 € G, and so x = x; hence, = is reflexive. If 
x = y, so that y = gx, then 

g'V = = (g _1 g)x = lx = X, 

so that x = g~ 1 V and y = x; hence, = is symmetric. If x = y and y = z, there 
are g, h e G with y = gx and z = hy, so that z = hy = h(gx) = ( hg)x , and 
x = z. Therefore, = is transitive, and hence it is an equivalence relation. Now 
the equivalence class of x e X is its orbit, for 

[x] = {y € X : y = x) = {gx : g € G} = 0(x). 


Proposition 2.140. If G acts on a set X, then X is the disjoint union of the 
orbits. If X is finite, then 

\X\ = Y J \0{x i )\, 

i 

where one Xi is chosen from each orbit. 

Proof. This follows from Proposition 2.20, for the orbits form a partition of X. 

The count given in the second statement is correct: since the orbits are dis- 
joint, no element in X is counted twice. • 

Here is the connection between orbits and stabilizers. 
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Theorem 2.141. If G acts on a set X and x e X, then 

\0(x)\ = [G : G x ] 
the index of the stabilizer G x in G. 

Proof Let G/G x denote the family of all the cosets of G x in G. We will 
exhibit a bijection <p : O(x) — »• G/G x : this will give the result, since \G/G X \ = 
[G : G a ], by Corollary 2.82 of Lagrange’s theorem. If y e O(x). then y = gx 
for some g e G; define < p(y) = gG x . Now cp is well-defined: if y = hx for 
some h e G, then h~ l gx = x and h~ l g e G_ r ; hence hG x = gG x . To see that 
<p is injective, suppose that (p(y) = <p(z)', then there are g, h € G with y = gx, 
z = hx, and gG x = hG x , that is, h~ l g e G x . It follows that h~ [ gx = x, 
and so y = gx = hx = z. Finally, cp is a surjection: if gG x e G/G x , then let 
y = gx e O(x), and note that <p(y) = gG x . • 

In Example 2.137, D% acting on the four corners of a square, we saw that 
\0(v Q )\ = 4, |GJ = 2, and [G : G„ 0 ] = 8/2 = 4. In Example 2.139, 
G = (o) < S„ acting on X = {1.2,..., /?}, we saw that if, in the complete 
factorization of o into disjoint cycles a = f>\ ■ ■ ■ fnia), the r -cycle fij moves I, 
then r = \0(l)\ for any i occurring in f j. Theorem 2.141 says that r is a divisor 
of the order k of o . (But Theorem 2.54 tells us more: k is the 1cm of the lengths 
of the cycles occurring in the factorization.) 

Corollary 2.142. If a finite group G acts on a set X, then the number of ele- 
ments in any orbit is a divisor of \ G\. 

Proof. This follows at once from Theorem 2.141 and Lagrange’s theorem. • 

Corollary 2.143. Ifx lies in a finite group G, then the number of conjugates of 
x is the index of its cen tralizer. 

\x G \ = [G : C G (x)], 

and hence it is a divisor of\G\. 

Proof. As in Example 2.138, the orbit of x is its conjugacy class x G , and the 
stabilizer G x is the centralizer Cq (x). • 

In Example 2.29, there is a table displaying the number of permutations in 
54 of each cycle structure; these numbers are 1, 6 , 8 , 6 , 3. Note that each of 
these numbers is a divisor of IS 4 I = 24. In Example 2.30, we saw that the 
corresponding numbers in 5s are 1, 10, 20, 30, 24, 20, and 15, and these are 
all divisors of |5s| = 120. We now recognize these subsets as being conjugacy 
classes, and the next corollary explains why these numbers divide the group 
order. 
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Corollary 2.144. If a e S n , then the number of permutations in S n having the 
same cycle structure as a is a divisor ofn !. 

Proof. This follows at once from Corollary 2.143 once one recalls Proposi- 
tion 2.33 which says that two permutations in S n are conjugate in S„ if and only 
if they have the same cycle structure. • 

When we began classifying groups of order 6, it would have been helpful to 
be able to assert that any such group has an element of order 3 (we were able to 
use an earlier exercise to assert the existence of an element of order 2). We now 
prove that every finite group G contains an element of prime order p for every 
P I |G|. 

If the conjugacy class x G of an element x in a group G consists of x alone, 
then x commutes with every g e G, for gxg -1 = x; that is, x e Z(G). Con- 
versely, if x e Z(G), thenx G = {x}. Thus, the center Z(G) consists of all those 
elements in G whose conjugacy class has exactly one element. 

Theorem 2.145 (Cauchy). If G is a finite group whose order is divisible by a 
prime p, then G contains an element of order p. 

Proof. We prove the theorem by induction on | G | ; the base step | G | = 1 is 
vacuously true, for there are no prime divisors of 1. If x e G, then the number 
of conjugates of x is |x G | = [G : C a (x ) \ . where C G (x ) is the centralizer of x 
in G. As noted above, if x ^ Z(G), then x G has more than one element, and 
so \C G (x)\ < | G | . If p | C(; (x ) for some noncentral x, then the inductive 
hypothesis says there is an element of order p in C G (x ) < G, and we are done. 
Therefore, we may assume that p \ C G (x) \ for all noncentral x e G. Better, 
since |G| = [G : C G (x)]\C G (x)\, Euclid’s lemma gives 

P I [G : C G (x)]. 

After recalling that Z(G) consists of all those elements x e Gwith|x G | = 1, 
we may use Proposition 2.140 to see 

I G | = \Z(G)\ + J2 [G : C G (x , )], 

i 

where one x; is selected from each conjugacy class having more than one ele- 
ment. Since |G| and all [G : C G (xj)] are divisible by p, it follows that |Z(G)| 
is divisible by p. But Z(G) is abelian, and so Proposition 2.122 says that Z(G), 
and hence G, contains an element of order p. • 

Definition. The class equation of a finite group G is 

I G | = \Z(G)\ + J2 [G : C G (x , )], 
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where one Xj is selected from each conjugacy class having more than one ele- 
ment. 


Definition. If p is a prime, then a p-group is a group of order p" for some 
n > 0. 

There are groups whose center is trivial; for example, Z(S 3 ) = {1}. For 
p-groups with more than one element, however, this is never true. 

Theorem 2.146. If p is a prime and G is a p-group with more than one element, 
then Z(G ) ^ {l}. 

Proof. Consider the class equation 

|G| = \Z(G)\ + J2 \G : C G (*/)]■ 

i 

Each Cc(xi ) is a proper subgroup of G, for Xj £ Z{G). Since G is a /r-group, 
[G : Cg(x,)] is a divisor of |G|, hence is itself a power of p. Thus, p divides 
each of the terms in the class equation other than |Z(G)|, and so p \ |Z(G)| as 
well. Therefore, Z(G) f {1}. • 

Corollary 2.147. If p is a prime, then every group G of order p 2 is abelian. 

Proof. If G is not abelian, then its center Z(G) is a proper subgroup, so that 
| Z(G) | = 1 or p, by Lagrange’s theorem. But Theorem 2.146 says that Z(G) 7 ^ 
{1}, and so |Z(G)| = p. The center is always a normal subgroup, so that the 
quotient G/Z(G) is defined; it has order p, and hence G/Z(G) is cyclic. This 
contradicts Exercise 2.87 on page 187. • 


Example 2.148. 

For every prime p. there exist nonabelian groups of order p 3 . Define UT(3, p) to 
be the subgroup of GL(3, 1 /( ) consisting of all upper triangular matrices having 
l’s on the diagonal; that is. 


UT(3, p) = 



1 a b 
0 1 
0 0 


b ~ 

c 

1 J 


: a, b, c e I 



It is easy to see that UT(3, p) is a subgroup of GL(3, l p ), and it has order p 3 
because there are p choices for each of a,b, c. The reader will have no diffi- 
culty finding two matrices in UT(3, p) that do not commute. (Exercise 2.1 1 1 on 
page 204 says that UT(3, 2) = Dg) ◄ 
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Example 2.149. 

Who would have guessed that Cauchy’s theorem and Fermat’s theorem are spe- 
cial cases of some common theorem? 22 The elementary yet ingenious proof of 
this is due to J. H. McKay (as A. Mann has shown me). If G is a finite group and 
p is a prime, denote the cartesian product of p copies of G by G p , and define 

A = {(ci], ai, . ■ . , a p ) e G p : a\d2 ...a p = 1 }. 

Note that A = \G\ P ~ ] , for having chosen the first p — 1 entries arbitrarily, the 
72th entry must equal {a \d2 • • • a p - i) -1 . Now make X into an I p -set by defining, 
for 0 < i < p — 1, 

[ i K n i . r?2, • • . » dp) — 2> • • * » d p , . . . , o.i). 

The product of the entries in the new 72-tuple is a conjugate of a\a2 ■ ■ ■ a p : 

o I+ ia ;+ 2 • • • a p a\ci 2 •••«,= (flifl2 • • • o,) _1 (flifl2 • • • a p ){a\a 2 ■ ■ ■ a,). 

This conjugate is 1 (forg -1 lg = 1 ), and so [/](«!, «2. • • • , a p ) £ X. By Corol- 
lary 2 . 142 , the size of every orbit of A is a divisor of |I /; | = 72; since 72 is prime, 
these sizes are either 1 or 72. Now orbits with just one element consist of a 72- 
tuple all of whose entries a, are equal, for all cyclic permutations of the 72-tuple 
are the same. In other words, such an orbit corresponds to an element a e G 
with a p = 1 . Clearly, ( 1 , 1 , .... 1 ) is such an orbit; if it were the only such, then 
we would have 

|G| P_1 = | A | = 1+ ^72 

for some k > 0 ; that is, \G\ p ~ l = 1 mod 72. If 72 is a divisor of | G \ , then we have 
a contradiction, for \G\ p ~ l =0 mod 72. We have thus proved Cauchy’s theorem: 
if a prime 72 is a divisor of |G|, then G has an element of order 72. 

Suppose now that G is a group of order n and that 72 is not a divisor of n ; for 
example, let G = I„. By Lagrange’s theorem, G has no elements of order 72, so 
that if a e G and a p = 1 , then a = 1 . Therefore, the only orbit in G p of size 1 
is (1, 1, . . . , 1), and so 

n p ~ { = \G\ p ~ l = | A | = 1+^72; 

that is, if 72 is not a divisor of n, then n p ~ 1 = 1 mod 72. Multiplying both sides 
by n, we have n p = n mod 72. This congruence also holds when 72 is a divisor of 
n, and this is Fermat’s theorem. A 

We have seen, in Proposition 2 . 97 . that 4 4 is a group of order 12 having no 
subgroup of order 6. Thus, the assertion that if J is a divisor of | G |, then G must 
have a subgroup of order d, is false. However, this assertion is true when G is a 
72-group. Indeed, more is true; G must have a normal subgroup of order d. 

22 If G is a group of order n and p is a prime, then the number of solutions x e G of the 
equation x p — 1 is congruent to n p ~ l mod p. 



Group Actions 201 


Proposition 2.150. If G is a group of order |G| = p e , then G has a normal 
subgroup of order p k for every k < e. 

Proof. We prove the result by induction on e > 0. The base step is obviously 
true, and so we proceed to the inductive step. By Theorem 2.146, the center of G 
is nontrivial: Z(G) f {1}. If Z(G) = G, then G is abelian, and we have already 
proved the result in Proposition 2.122. Therefore, we may assume that Z(G) is 
a proper subgroup of G. Since Z(G) <1 G, we have G/Z(G) a p-group of order 
strictly smaller than |G|. Assume that |Z(G)| = p c . If k < c, then Z(G) and, 
hence G, contains a normal subgroup of order p k , because Z(G) is abelian. If 
k > c, then G/Z(G) contains a normal subgroup S* of order p k ~ c , by induction. 
The correspondence theorem gives a normal subgroup S of G with 

Z(G) < S < G 

such that S/Z(G) = S*. By Corollary 2.82 to Lagrange’s theorem, 

|S| = \S*\\Z(G)\ = p k ~ c ■ p c = p k . . 

Abelian groups (and the quaternions) have the property that every subgroup 
is normal. At the opposite pole are groups having no normal subgroups other 
than the two obvious ones: {1} and G. 

Definition. A group G is called simple if G f {1} and G has no normal 
subgroups other than { 1 } and G itself. 

Proposition 2.151. An abelian group G is simple if and only if it is finite and 
of prime order. 

Proof. If G is finite of prime order p, then G has no subgroups H other than 
{1} and G, otherwise Lagrange’s theorem would show that \H\ is a divisor of p. 
Therefore, G is simple. 

Conversely, assume that G is simple. Since G is abelian, every subgroup 
is normal, and so G has no subgroups other than {1} and G. Choose x e G 
with r / 1. Since (x) is a subgroup, we have (x) = G. If x has infinite order, 
then all the powers of x are distinct, and so (x 2 ) < (x) is a forbidden subgroup 
of (x), a contradiction. Therefore, every x e G has finite order, say, m. If m 
is composite, then m = kt and (x k ) is a proper nontrivial subgroup of (x), a 
contradiction. Therefore, G = (x) has prime order. • 

We are now going to show that A 5 is a nonabelian simple group (indeed, it 
is the smallest such; there is no nonabelian simple group of order less than 60). 
Suppose that an element x e G has k conjugates; that is 

\x G \ = : g eG}\=k. 
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If there is a subgroup H < G with x e H < G. how many conjugates does x 
have in HI Since 

x H = [hxh~ l : h e H) c {gxg -1 : g e G) = x G , 

we have \x H \ < \x G \. It is possible that there is strict inequality \x H \ < |* G |. 
For example, take G = S 3 , x = (1 2), and H = (x). We know that |x G | = 
3 (because all transpositions are conjugate), whereas \x H \ = 1 (because H is 
abelian). 

Now let us consider this question, in particular, for G = S 5 , x = (1 2 3), 
and H = A 5 . 

Lemma 2.152. All 3-cycles are conjugate in A 5 . 

Proof. Let G = S$, a = (1 2 3), and H = A 5 . We know that |a 55 | = 20, 
for there are 20 3-cycles in 55 (as we saw in Example 2.30). Therefore, 20 = 
I S 5 1 / 1 Cs 5 (a) | = 120/|Cs 5 (ar)|, by Corollary 2.143, so that |Cs 5 (a)| = 6 ; that is, 
there are exactly six permutations in 5s that commute with a. Here they are: 

(1), (1 2 3), (1 3 2), (4 5), (4 5) (1 2 3), (4 5)(1 3 2). 

The last three of these are odd permutations, so that |Ca 5 (tf ) = 3. We conclude 
that 

|a As | = | A 5 1 / 1 C a 5 (or ) | = 60/3 = 20; 

that is, all 3-cycles are conjugate to a = (1 2 3) in A 5 . • 

This lemma, which says that A 5 is generated by the 3-cycles, can be gener- 
alized from /i s to A n for all n > 5; see Exercise 2.1 16 on page 205. 

Lemma 2.153. Every element in /\ 5 is a 3-cycle or a product of 3-cycles. 

Proof. If a e A 5 , then a is a product of an even number of transpositions: 
a = T 1 T 2 • • • T 2 „-i T 2 „. As the transpositions may be grouped in pairs 12 /- 1 G/, it 
suffices to consider products rr', where r and r' are transpositions. If r and r' 
are not disjoint, then r = ( i j ), r' = (i k), and rr' = ( i k j); if r and r' are 
disjoint, then rr' = (i j)(k l) = (i ./)(./ k)(j k)(k l) = (i j k)(j k l). • 

Theorem 2.154. A 5 is a simple group. 

Proof. We shall show that if H is a normal subgroup of A 5 and H f {(1)}, then 
H = A 5 . Now if PI contains a 3-cycle, then normality forces H to contain all its 
conjugates. By Lemma 2.152, H contains every 3-cycle, and by Lemma 2.153, 
PI = A 3 . Therefore, it suffices to prove that PI contains a 3-cycle. 
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As H 7 ^ {(1)}, it contains some a (1). We may assume, after a harmless 
relabeling, that either a = (1 2 3), a = (1 2)(3 4), or a = (1 2 3 4 5). As we 
have just remarked, we are done if a is a 3-cycle. 

If a = (1 2) (3 4) e H, use Proposition 2.32: conjugate a by f> = (3 4 5) 
to have fcrf~ l = a' = (1 2) (4 5) e H (because ft e A 5 and H <\ S 5 ). Hence, 

0 o’ = (3 4 5 ) e H. 

If a = (1 2 3 4 5) e H, use Proposition 2.32: conjugate a by y = (1 2 3) 
to have ycry~ l = a" = (2 3 1 4 5) e H (because y € A 5 and H <\ S 5 ). 
Hence, cr"er -1 =(2 3 1 4 5) (5 4 3 2 1) = (1 2 4 ) e H. We should say how 
this last equation arose. If a e H and y is a 3-cycle, then yay~ l e H, and 
so (yay~ l )a~ l € H. Reassociating, y(cry - 1 cr -1 ) e H. But ay~ l a~ l is a 
3-cycle, so that H contains a product of two 3-cycles. We have chosen y more 
carefully to force this product of two 3-cycles to be a 3-cycle. 

We have shown, in all cases, that H contains a 3-cycle. Therefore, the only 
normal subgroups in /I 5 are { ( 1 ) } and /U itself, and so / 1 5 is simple. • 

As we shall see in Chapter 5, Theorem 2.154 turns out to be the basic reason 
why the quadratic formula has no generalization giving the roots of polynomials 
of degree 5 or higher. 

Without much more effort, we can prove that the alternating groups A n are 
simple for all n > 5. Observe that A 4 is not simple, for the four-group V is a 
normal subgroup of A 4 . 

Lemma 2.155. A(, is a simple group. 

Proof. Let H f { ( 1 ) } be a normal subgroup of A (, ; we must show that H = A(,. 
Assume that there is some a e H with a (1) which fixes some i, where 

1 < i < 6 . Define 

F = [o e A(, : o (i) = i}. 

Now a e FI 0 F, so that H D F { ( 1 ) } . The second isomorphism theorem 
gives FI n F < F. But F is simple, for F = A$, by Exercise 2.1 18 on page 205, 
and so the only normal subgroups in F are {(1)} and F . Since H C\ F f { ( 1 ) } , 
we have H Cl F = F; that is, F < H. It follows that FI contains a 3-cycle, and 
so H = A(„ by Exercise 2.1 16 on page 205. 

We may now assume that there is no a e FI with a ( 1 ) which fixes some 
i with 1 < i < 6 . If one considers the cycle structures of permutations in A 6 , 
however, any such a must have cycle structure (1 2) (3 4 5 6 ) or (1 2 3) (4 5 6 ). 
In the first case, a 2 e H is a nontrivial permutation which fixes 1 (and also 2), a 
contradiction. In the second case, H contains /T 1 ), where f = (2 3 4), 

and it is easily checked that this is a nontrivial element in FI which fixes 6 , 
another contradiction. Therefore, no such normal subgroup FI can exist, and so 
A 6 is a simple group. • 
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Theorem 2.156. A„ is a simple group for all n > 5. 

Proof. If H is a nontrivial normal subgroup of A n [that is, H f (1)], then we 
must show that H = A„; by Exercise 2.116 on page 205, it suffices to prove 
that H contains a 3-cycle. If f e H is nontrivial, then there exists some i that 
fi moves; say, fi{i) = j / i. Choose a 3-cycle a which fixes i and moves 
j . The permutations a and fi do not commute: fail) = f (i ) = j, while 
afi(i) = ui j ) f j . It follows that y = (afia ~ 1 )fi~ 1 is a nontrivial element of 
H. But fa~ [ 1 1 -1 is a 3-cycle, by Proposition 2.32, and so y = a(fia~ l fi~ l ) is 
a product of two 3-cycles. Hence, y moves at most 6 symbols, say, i j , . . . , ig (if 
y moves fewer than 6 symbols, just adjoin others so we have a list of 6 ). Define 

F = {a e A„ : a fixes all i j - / 1 , . . . , ig}. 

Now F = Ag, by Exercise 2.1 18 on page 205, and y e H H F. Hence, H n F 
is a nontrivial normal subgroup of F . But F is simple, being isomorphic to Ag, 
and so H D F = F; that is, F < H. Therefore, H contains a 3-cycle, and so 
H = A„ ; the proof is complete. • 


Exercises 

2.102 If a and b are elements in a group G, prove that ab and ba have the same order. 

2.103 Prove that every translation x a e Sq, where z a : g !-»■ ag, is a regular permutation 
(see Exercise 2.26 on page 121). The homomorphism <p: G —*■ So, defined by 
<p(a ) = x a , is often called the regular representation of G. 

2.104 Prove that no pair of the following groups of order 8 , 

l8i I 4 X 1 2 ; n 2 x 1 2 x I 2 ; £*8; Q, 

are isomorphic. 

*2.105 If p is a prime and G is a fi nite group in which every element has order a power 
of p, prove that G is a p-group. 

*2.106 Prove that a fi nite p-group G is simple if and only if |G| = p. 

*2.107 Show that .S '4 has a subgroup isomorphic to Dg. 

*2.108 Prove that S 4 /V = S 3 . 

2.109 (i) Prove that A 4 f I)\ 2 - 

(ii) Prove that D 12 = S 3 x 1 2 . 

*2.110 (i) If H is a subgroup of G and if x G H , prove that 

C H (x) = HP C G (x). 

(ii) If H is a subgroup of index 2 in a fi nite group G and if x e II . prove that 
either \x H \ = \x G \ or \x H \ = ^\x G \, where x H is the conjugacy class of 
x in H. 

*2.111 Prove that the group UT(3, 2) in Example 2.148 is isomorphic to Dg. 
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2.112 (i) How many permutations in S5 commute with (1 2) (3 4), and how many 

even permutations in S 5 commute with (1 2) (3 4). 

(ii) How many permutations in S^ commute with (1 2)(3 4 5)? 

(iii) Exhibit all the permutations in Sj commuting with (1 2) (3 4 5). 

*2.113 (i) Show that there are two conjugacy classes of 5-cycles in A5, each of 

which has 12 elements. 

(ii) Prove that the conjugacy classes in A5 have sizes 1, 12, 12, 15, and 20. 
2.114 (i) Prove that every normal subgroup H of a group G is a union of conjugacy 

classes of G, one of which is {1}. 

(ii) Use part (i) and Exercise 2.113 to give a second proof of the simplicity 
of A 5 . 

*2.115 If ct, r G S5, where a is a 5-cycle and r is a transposition, prove that (a, r) = 5 5 . 
*2.116 (i) For all n > 3, prove that every a G A„ is a product of 3- cycles. 

(ii) Prove that if a normal subgroup H <\ A„ contains a 3-cycle, where n > 5, 
then H = A„. ( Remark . See Lemmas 2.153 and 2.153.) 

2.117 Prove that the only normal subgroups of S4 are {(1)}, V, A4, and S 4 . 

*2.118 Let {* 1 , . . . , i r } C {1, 2, . . ./?}, and let 

F = {o' G A„ : a h xes all i with i / 4 , . . . , i r }. 

Prove that F = A r . 

2.119 Prove that A5 is a group of order 60 that has no subgroup of order 30. 

2.120 Let X = {1,2,3, . . . } be the set of all positive integers, and let Sx be the symmetric 
group on X. 

(i) Prove that F 0 0 = [a G Sx ■ o moves only h nitely many n G X} is a 
subgroup of Sx- 

(ii) Defi ne 4 xj t0 be the subgroup of F^ generated by the 3-cycles. Prove 
that A 00 is an inh nite simple group. 

2.121 (i) Prove that if a simple group G has a subgroup of index n, then G is 

isomorphic to a subgroup of S n . 

(ii) Prove that an infi nite simple group has no subgroups of fi nite index n > 1 . 
*2.122 Let G be a group with |G| = mp. where p is a prime and 1 < in < p. Prove that 
G is not simple. 

Remark. . Of all the numbers smaller than 60, we can now show that all but 1 1 
are not orders of nonabelian simple groups (namely, 12, 18, 24, 30, 36, 40, 45, 48, 
50, 54, 56). Theorem 2.146 eliminates all prime powers (for the center is always 
a normal subgroup), and Exercise 2.122 eliminates all numbers of the form mp, 
where p is a prime and m < p. (We will complete the proof that there are no 
nonabelian simple groups of order less than 60 in Theorem 6.25.) ◄ 

*2.123 If n > 3, prove that A„ is the only subgroup of S n of order jn\. 

*2.124 Prove that A 6 has no subgroups of prime index. 


2.8 Counting with Groups 

We are now going to use group theory to do some fancy counting. 
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Lemma 2.157. 

(i) Let a group G act on a set X. If x € X and a € G, then G ax = aG x a~ [ . 

(ii) If a finite group G acts on a finite set X and ifx and y lie in the same orbit, 
then \G y \ = \G X \. 

Proof 

(i) If r e G x , then rx = x. If ax = y, we have 

<TTcr~ 1 y = axa~ { ax = a xx = ax = y. 

Therefore, axa~ x fixes y, and so aG x a~ 1 < G y . The reverse inclusion is 
proved in the same way, for x = a -1 y. 

(ii) If x and y are in the same orbit, then there is a £ G with y = ax, and so 

\Gy\ = \G ax \ = \aG x a~ 1 \ = \G x \. • 

Theorem 2.158 (Burnside’s Lemma). 23 Let G act on a finite set X. IfN is the 
number of orbits, then 

1 \ - 

N = ) Fix), 

where F(x) is the number of x € X fixed by x. 

Proof List the elements of X as follows: choose xi € X, and then list all 
the elements in the orbit 0(x f); say, 0(x\) = {x\, X 2 , ■ • ■ , x r }; then choose 
x,-+i ^ 0{x i), and list the elements of 0(x r + 1 ) as x r +i, x r + 2 , . . continue 
this procedure until all the elements of X are listed. Now list the elements 
x\, X 2 , ■ ■ ■ , x n of G, and form the following array of 0’s and 1 ’s, where 

I I if Xj fixes xj 

0 if Xi moves xj. 

Now Fix,), the number of x fixed by r is the number of l’s in the ith row of 
the array; therefore, Fix) is the total number of l’s in the array. Let us 

now look at the columns. The number of l’s in the first column is the number 
of x i that fix x \ ; by definition, these r, comprise G X] . Thus, the number of 

23 Burnside’s infhential book, The Theory of Groups of Finite Order , had two editions. In 
the fi rst edition, he attributed this theorem to G. Frobenius; in the second edition, he gave no 
attribution at all. However, the commonly accepted name of this theorem is Burnside ’s lemma. 
To avoid the confusion that would be caused by changing a popular name, P. M. Neumann 
suggested that it be called ‘hot-Burnside’s lemma.” Burnside was a fi ne mathematician, and 
there do exist theorems properly attributed to him. For example, Burnside proved that if p and 
q are primes, then there are no simple groups of order p m q n . 



Counting with Groups 207 



Xl 

X r 

X r + 1 

• X j ••• 


fl.l 

■ f\, r 

/l,r+l 

' flj ••• 

T2 

fl,l '■ 

■ h, r 

/2,r+l 

' flj ' ' ' 

T 

fiA •• 

■ fi, T 

fi,r+ 1 ' ' 

fij 

T n 

fn, 1 

fn,r 

fn,r+ 1 

fnj 


l’s in column 1 is \G XI \. Similarly, the number of l’s in column 2 is \G X2 \. 
By Lemma 2. 1 57(ii), | G XI \ = \G X2 \. By Theorem 2.141, the number of l’s in 
the r columns labeled by the x t e 0(x \ ) is thus 

r\G Xl \ = \0(xi)\ ■ \G Xl | = (|G|/|G,,|) \G Xl \ = |G|. 

The same is true for any other orbit: its columns contain exactly |G| l’s. There- 
fore, if there are N orbits, there arc N\G\ l’s in the array. We conclude that 

^F(r) = fV|G|. . 

reG 

We are going to use Burnside’s lemma to solve problems of the following 
sort. How many striped flags are there having six stripes (of equal width) each 
of which can be colored red, white, or blue? Clearly, the two flags in Figure 2.19 
are the same: the bottom flag is just the reverse of the top one (the flag may be 
viewed by standing in front of it or by standing in back of it). 


r 

w 

b 

r 

w 

b 


b 

w 

r 

b 

w 

r 


Figure 2.19 A Flag 


Let X be the set of all 6-tuples of colors; if x € X, then 

X = (ci, C2, C3, C4, C5, C6), 

where each c; denotes either red, white, or blue. Let r be the permutation that 
reverses all the indices: 


T = 


1 2 3 4 5 6\ _. /0 .. 

6 5 4 3 2 1 =<10(25)04) 
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(thus, r “turns over” each 6-tuple x of colored stripes). The cyclic group G = (r) 
acts on X\ since |G| = 2, the orbit of any 6-tuple x consists of either 1 or 2 
elements: either r fixes x or it does not. Since a flag is unchanged by turning it 
over, it is reasonable to identify a flag with an orbit of a 6-tuple. For example, 
the orbit consisting of the 6-tuples 

(r, w, b, r, w, b) and (b, w , r,b, w,r) 

describes the flag in Figure 2.19. The number of flags is thus the number N of 
orbits; by Burnside’s lemma, N = F(( I )) + F(r ) |. The identity permutation 

(1) fixes every x e X, and so F ((\ )) = 3 6 (there are 3 colors). Now r fixes a 
6-tuple x if and only if x is a “palindrome,” that is, if the colors in x read the 
same forward as backward. For example, 

x = (r, r, w, w, r, r) 


is fixed by r. Conversely, if 

X = (c 1, C2, C3, C4, c 5, c 6) 

is fixed by r = (1 6) (2 5) (3 4), then ci = c$, c? = c 5, and C3 = C4; that is, x is 
a palindrome. It follows that F( r) = 3 , for there are 3 choices for each of c\, 
C2, and C3. The number of flags is thus 

N = ^(3 6 + 3 3 ) = 378. 

Let us make the notion of coloring more precise. 


Definition. Given an action of a group G on X = { 1 , ...,«} and a set C of q 
colors, then G acts on the set C n of all n -tuples of colors by 

r(ci ,c n ) = (c T 1 , . . . , c T „) for all r e G. 

An orbit of (c\, . . . , c„) e C" is called a (q, G)-coloring of X. 


Example 2.159. 

Color each square in a 4 x 4 grid red or black (adjacent squares may have the 
same color; indeed, one possibility is that all the squares have the same color). 

If X consists of the 16 squares in the grid and if C consists of the two colors 
red and black, then the cyclic group G = {R) of order 4 acts on X, where R is 
clockwise rotation by 90°; Figure 2.20 shows how R acts: the right square is R’s 
action on the left square. In cycle notation, 
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Figure 2.20 Chessboard 


R = ( 1, 4, 16, 13) (2, 8, 15, 9)(3, 12, 14, 5)(6, 7, 11, 10), 

* 2 = (1, 16) (4, 13) (2, 15) (8, 9) (3 , 14)(12, 5)(6, 11)(7, 10), 

R 3 = ( 1, 13, 16, 4) (2, 9, 15, 8)(3, 5, 14, 12)(6, 10, 11, 7). 

A red-and-black chessboard does not change when it is rotated; it is merely 

viewed from a different position. Thus, we may regard a chessboard as a (2, G)- 
coloring of X ; the orbit of a 16-tuple corresponds to the four ways of viewing 
the board. 

By Burnside’s lemma, the number of chessboards is 

±[F((1)) + F(R) + F(R 2 ) + F(tf 3 )]. 

Now F((l)) = 2, for every 16-tuple is fixed by the identity. To compute F(R), 
note that squares 1,4, 16, 13 must all have the same color in a 16-tuple fixed by 
R. Similarly, squares 2, 8, 15, 9 must have the same color, squares 3, 12, 14, 5 
must have the same color, and squares 6, 7, 11, 10 must have the same color. We 
conclude that F(R) = 2, note that the exponent 4 is the number of cycles in the 
complete factorization of R. A similar analysis shows that F(R 2 ) = 2°, for the 
complete factorization of R 2 has 8 cycles, and F(R 3 ) = 2, because the cycle 
structure of R 3 is the same as that of R. Therefore, the number N of chessboards 

is r 

N= \ 2 16 + 2 4 + 2 8 + 2 4 = 16,456. 

Doing this count without group theory is more difficult because of the danger of 
counting the same chessboard more than once. < 

We now show that the cycle structure of a permutation r allows one to cal- 
culate F(r). 



1 
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Theorem 2.160. Let C be a set ofq colors, and let x e S n . 

(i) If F(x) is the number of x e C" fixed by r, and if t(x) is the number of 
cycles in the complete factorization of x, then 

F(x) = q t(T \ 


(ii) 


If a finite group G acts on X = {1, . . . , n), then the number N of (q, G )- 
colorings of X is 


N = — y q ,(r) 
\n\ 1 


reG 


where t{ x) is the number of cycles in the complete factorization ofx. 


Proof. 

(i) Let x e S n and let x = fi\ ■ ■ ■ fi, he a complete factorization, where each fij 
is an r 7 -cycle. If / 1 , . . . , i r . are the symbols moved by fij, then i^+i = x k i\ 
for k < rj. Since r(ci, . . . , c n ) = (c r i, . . . , c xn ) = (ci, . . . , c„), we see that 
c T j l = Ci Y have the same color. But x 2 i\ also has the same color as ip, if fact, 
x k i\ has the same color as / 1 for all k. Now there is another way to view these 
points. By Example 2.139, the points x k i\ are precisely the symbols moved by 
fj ; that is, f>j = (i i, 12 , . . . , i rj )- Thus, (ci, . . . , c n ) is fixed by r if, for each j, 
all the symbols cu for k moved by fij must have the same color. As there are q 
colors and t (r) f j’s, there are g f(T) n-tuples fixed by r. 

(ii) Substitute q' (T> for Fix) into the formula in Burnside’s lemma. • 


Example 2.161. 

We can now simplify the computations in Example 2.159. The group G acting on 
the set A of all 4 x 4 grids consists of the 4 elements 1 . R , R-, R 2 . The complete 
factorizations of these elements were given in the example, from which we see 
that 

r (1) = 16, x(R) =4 = x(R 3 ), x{R 2 ) = 8. 

It follows from Theorem 2.160 that 

N = |[ 2 16 + 2 - 2 4 + 2 8 ]. < 

We introduce a polynomial in several variables to allow us to state a more 
delicate counting result due to P olya. 

Definition. If the complete factorization of r e S n has e r (x) > 0 r-cycles, 
then the index of x is the monomial 

ind(r) = x^ r) x/ (T) • • •x^" (T) . 
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If G is a subgroup of S n , then the cycle index of G is the polynomial in n vari- 
ables with coefficients in Q: 

1 x - 

P G (x — > tnd(r). 

|G| teG 

In our earlier discussion of the striped flags, the group G was a cyclic group 
of order 2 with generator r = (1 6 ) (2 5)(3 4). Thus, ind((l)) = xf, ind(r) = 
x 3 , and 

Pg(* l. ...,xe) = j(xf +x\). 

As a second example, consider all possible blue-and-white flags having 9 
stripes. Here |X| = 9 and G = (r) < Sg, where 

r = (1 9)(2 8 ) (3 7) (4 6)(5). 

Now, ind((l)) = x \ , ind(r) = x\x\, and the cycle index of G = (r) is thus 
Pg(x i xg) = 3(xj +X1X2). 

In Example 2.159, we saw that the cyclic group G = {R) of order 4 acts on 
a grid with 16 squares, and: 

ind((l)) = x} 6 ; ind (R) = x\\ ind(f? 2 ) = x|; ind(7? 3 ) = x\. 

The cycle index is thus 

P G (x 1 , . . . , x\e) = \(x\ 6 + x\ + 2 x 4 ). 


Proposition 2.162. If\X\ = n and G is a subgroup of S n , then the number of 
{cp G)-colorings ofX is Pc(q q), where If; (x \ , . . . , x n ) is the cycle index. 

Proof By Theorem 2.160, the number of (q, G)-colorings of X is 


— 

\n\ £—1 1 


reG 


where t (r) is the number of cycles in the complete factorization of r. On the 
other hand, 


P G (xi,...,x„) = 




teG 


1 \ " e 

~ \a\ 2^, x i 


ei(T) ei(T) 


y.e„(r) 


teG 
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and so 


P G (q, ...,<?) = 


^ „ei(r)+e2U)-l Pe„(r) 

' G ' hi 


— Yq™. 

n i 1 


reG 




Let us count again the number of red-and-black chessboards with sixteen 
squares in Example 2.159. Here, 

P G (x i, . . . , xi6) = x\ 6 + x\+ 2xl). 

and so the number of chessboards is 

P g (2,...,2) = I(2 16 + 2 8 + 2-2 4 ). 


The reason we have introduced the cycle index is that it allows us to state 
P 'olya’s generalization of Burnside’s lemma which solves the following sort of 
problem. How many blue- and- white flags with 9 stripes have 4 blue stripes and 
5 white stripes? More generally, we want to count the number of orbits in which 
we prescribe the number of “stripes” of any given color. 


Theorem 2.163 (Polya). Let G < Sx, where |X| = n, let \C\ = q, and, for 
each i > 1, define Oj = c\ + • • • + c’ q . Then the number of (q, G)-colorings of 

X having f r elements of color c r , for every r, is the coefficient ofc{ l ctfi ■ ■ ■ c q ‘ l 
in P G (a i a n ). 

Proofs of P 'olya’s theorem can be found in combinatorics books (for example, 
see Biggs, Discrete Mathematics). To solve the flag problem posed above, first 
note that the cycle index for blue-and-white flags having 9 stripes is 

P G (x\ ,x 9 ) = \(x\ +x\x\). 


and so the number of flags is P G ( 2, ... ,2) = ^(2 9 + 2 5 ) = 272. Using P 'olya’s 
theorem, the number of flags with 4 blue stripes and 5 white ones is the coeffi- 
cient of b 4 w 5 in 


P G (<y\ cr 9 ) = \ (b + w ) 9 + (b+ w)(b 2 + iu 2 ) 4 


A calculation using the binomial theorem shows that the coefficient of b 4 w 5 
is 66. 
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Exercises 

2.125 How many fligs are there with n stripes each of which can be colored any one of 
q given colors? 

2.126 Let X be the squares in an n x n grid, and let p be a rotation by 90°. Dell ne a 
chessboard to be a (q, G)-coloring, where the cyclic group G = ( p > of order 4 is 
acting. Show that the number of chessboards is 

l(/ + ^L(n 2 +l)/2J +2 ^ L (» 2 +3)/4j) ! 

where [xj is the greatest integer in the number x. 

2.127 Let X be a disk divided into n congruent circular sectors, and let p be a rotation by 
(360/«)° . Defi ne a roulette wheel to be a (q, G)-coloring, where the cyclic group 
G = (p) of order n is acting. Prove that if n = 6, then there are |(2< q + 2 q 1 + 
q 3 + q 6 ) roulette wheels having 6 sectors. 

[The formula for the number of roulette wheels with n sectors is 

Tl^4>(n/d)q d , 

d\n 

where 0 is the Euler 0-function.] 

2.128 Let X be the vertices of a regular n-gon, and let the dihedral group G = Di,, act 
(as the usual group of symmetries [see Example 2.62]). Defi ne a bracelet to be a 
{q, G)-coloring of a regular n-gon, and call each of its vertices a bead. (Not only 
can one rotate a bracelet; one can also lip it.) 

(i) How many bracelets are there having 5 beads, each of which can be col- 
ored any one of q available colors? 

(ii) How many bracelets are there having 6 beads, each of which can be col- 
ored any one of q available colors? 

(iii) How many bracelets are there with exactly 6 beads having 1 red bead, 2 
white beads, and 3 blue beads? 
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3.1 First Properties 

In high school algebra, one is usually presented with a list of “rules” for ordinary 
addition and multiplication of real numbers; these lists 1 are often quite long, hav- 
ing perhaps 20 or more items. For example, one rule is the additive cancellation 
law. 

if a + c = b + c, then a = b. 

Some rules, as this one, follow from properties of subtraction - just subtract c 
from both sides - but there are also rules involving two operations. One such is 
the distributive law: 

(i a + b)c = ac + bc\ 

when read from left to right, it says that c can be “multiplied through” a + b; 
when read from right to left, it says that c can be “factored out” of ac + bc. There 
is also the “mysterious” rule: 


(— 1) x (—1) = 1, (M) 

which involves both multiplication and subtraction. Lists of rules can be shrunk 
by deleting redundant items, but there is a good reason for so shrinking them 
aside from the obvious economy provided by a shorter list: a short list makes it 
easier to see analogies between numbers and other realms (such as polynomials) 
in which one can both add and multiply. Before exploring such other realms, let 
us dispel the mystery of (M). 

'For example, see H. S. Hall and S. R. Knight, Algebra for Colleges and Schools, Macmil- 
lan, 1923, or J. C. Stone and V. S. Mallory, A Second Course in Algebra, Sanborn, 1937. 
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Lemma 3.1. 0 ■ a = Ofor every number a. 

Proof. Since 0 = 0 + 0, the distributive law gives 

0 • a = (0 + 0) • a = (0 • a) + (0 • a). 

Now subtract 0 • a from both sides (that is, use the additive cancellation law) to 
get 0 = 0 • a. • 

Incidentally, we can now see why dividing by 0 is forbidden: given a num- 
ber b, its reciprocal 1 / b must satisfy b{\/b) = 1. In particular, 1 /0 would be a 
number satisfying 0-( 1/0) = 1. ButLemma3.1 gives0-(l/d) = 0, contradicting 
1#0. 

Lemma 3.2. If —a is that number which, when added to a, gives 0, then 
(— 1)(— a) = a. 

Proof The distributive law and Lemma 3.1 give 

0 = 0- (-a) = (-1 + 1)(— a) = (— 1)(— a) + (-a); 
now add a to both sides (the additive cancellation law again) to get a = (— 1)(— a). 


Setting a = 1 gives the (no longer) mysterious (M). 

While we are proving elementary properties, let us show that, fortunately, 
the product (— \)a is the same as —a. 

Corollary 3.3. (— l)o = —a for every number a. 

Proof By Lemma 3.2, (— 1)(— a) = a. Multiplying both sides by —1 gives 

(— 1)(— l)(-a)=(— l)a. 

But Lemma 3.2 gives ( — 1 ) ( — 1 ) = 1, so that —a = (— l )a. • 

Mathematical objects other than numbers can be added and multiplied. For 
example, in calculus, one adds and multiplies functions. Now the constant func- 
tion s(x) = 1 behaves just like the number 1 under multiplication. Is the analog 
of Lemma 3.2 true; is [— e(x)][— f(x)] = /(*)? The answer is yes, and the 
proof of this fact is exactly the same as the proof just given for numbers: just 
replace every occurrence of the letter a by fix) and the numeral 1 by e. 

We now focus on certain simple properties enjoyed by ordinary addition 
and multiplication, elevating them to the status of axioms. In essence, we are 
describing more general realms in which we shall be working. 



216 


Commutative Rings I Ch. 3 


Definition. A commutative ring 2 R is a set with two operations, addition and 
multiplication, such that: 

(i) a + b = b + a for all a, b e R', 

(ii) a + (b + c) = (a + b) + c for all a,b,c e R ; 

(iii) there is an element 0 e R with 0 + a = a for all a e R ; 

(iv) for each a e R, there is a' e R with a’ + a = 0; 

(v) ab = ba for all a, b e R] 

(vi) a {be) = ( ab)c for every a,b, c e R ; 

(vii) there is an element 1 e R, called one {ox the unit ), with 1 a = a for every 
a G R", 

(viii) a{b + c) = ab + ac for every a, b, c e R. 

Of course, axioms (i) through (iv) say that R is an abelian group under ad- 
dition. Addition and multiplication in a commutative ring R are operations, so 
there are functions 

a : R x R — > R with a{r, r ) = r + r e R 

and 

li : R x R -> R with \x{r, r') = rr' e R 

for all r, r' e R. The law of substitution holds here, as it does for any operation: 
if r = r' and s = s', then r + s = r' + s' and rs = r's'. For example, the proof 
of Lemma 3.1 begins with /x(0, a) = /x( 0 + 0, a), and the proof of Lemma 3.2 
begins with a(0, —a) = a{— 1 + 1, —a). 

Example 3.4. 


(i) The reader may assume that Z, Q, K, and C are commutative rings with the 
usual addition and multiplication (the ring axioms are verified in courses 
in the foundations of mathematics). 

“This term was probably coined by D. Hilbert, in 1897, when he wrote Zahlring. One 
of the meanings of the word ring, in German as in English, is collection, as in the phrase ‘h 
ring of thieves.” (It has also been suggested that Hilbert used this term because, for a ring 
of algebraic integers, an appropriate power of each element ‘tycles back” to being a linear 
combination of lower powers.) 

3 Some authors do not demand that commutative rings have 1. For them, the set of all even 
integers is a commutative ring, but we do not recognize it as such. They refer to our rings as 
commutative rings with one. 
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(ii) Let Z [ / 1 be the set of all complex numbers of the form a + bi, where 
a, b e 7L and i 2 = —1. It is a boring exercise to check that Z [z | is, in fact, 
a commutative ring (this exercise will be significantly shortened once the 
notion of subring has been introduced). Z [z ] is called the ring of Gaussian 
integers. 

(iii) Consider the set R of all real numbers x of the form 

x = a + bm, 

where a , b e Q and m = </2. It is easy to see that R is closed under 
ordinary addition, but we claim that R is not closed under multiplication. 
If co 2 e R. then there are rationals a and b with 

co 2 = a + bm. 

Multiplying both sides by m gives the equations: 

2 = am + bm 2 
= am + b(a + bm) 

= am + ab + b 2 m 
= ab + (a + b 2 )m. 

If a + b 2 = 0, then a = —b 2 , and the last equation gives 2 = ab; hence, 
2 = (—b 2 )b = —b 3 . But this says that the cube root of 2 is rational, 
contradicting Exercise 1.49(ii) on page 51. Therefore, a + b 2 ^ 0 and 
m = (2 — ab)/(a + b 2 ). Since a and b are rational, we have m rational, 
again contradicting Exercise 1.49(ii). Therefore, R is not closed under 
multiplication, and so R is not a commutative ring. M 


Remark. In the term commutative ring , the adjective modifies the operation 
of multiplication, for commutativity of addition is part of the general concept of 
ring. There are noncommutative rings; that is, there are sets with addition and 
multiplication satisfying all the axioms of a commutative ring except the com- 
mutativity axiom: ab = ba. [Actually, the definition replaces the axiom la = a 
by la = a = al, and it replaces the distributive law by two distributive laws, 
one on either side: a(b + c ) = ab + ac and ( b + c)a = ba+ca .] For example, let 
M denote the set of all 2 x 2 matrices with real entries. Example 2.48(i) defines 
multiplication of matrices, and we now define addition by 


a b 
c d 


a' b r 
c' d’ 


a + a 1 b + b' 
c + c' d + d' 
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It is easy to see that M, equipped with this addition and multiplication, satisfies 
all the new ring axioms except the commutativity of multiplication. 

Even though there are interesting examples of noncommutative rings, we 
shall consider only commutative rings in this book. < 

Proposition 3.5. Lemma 3.1, Lemma 3.2, and Corollary 3.3 hold for every 
commutative ring. 

Proof. Each of these results can be proved using only the defining axioms of a 
commutative ring. To illustrate, here is a very fussy proof of Lemma 3. 1 : if R is 
a commutative ring and a e R, then 0 • a = 0. 

Since 0 = 0 + 0, the distributive law gives 

0 • a = (0 + 0) • a = (0 • a) + (0 • a). 

Now add —(0 • a) to both sides: 

— (0 • a) + (0 • a) = —(0 • a) + [(0 • a) + (0 • a )]. 

The defining property of — (0 • a) gives the left side — (0 • a) + (0 • a) = 0, and so 

0 = — (0 • a) + [(0 • a) + (0 • a)]. 

We use associativity to simplify the right side. 

0 = —(0 • a) + [(0 • a) + (0 • a)] 

= [— (0 • a) + (0 • a)] + (0 • a) 

= 0 + (0 • a ) 

= 0 a. • 

It is unusual to give such a detailed proof, for it tends to make a simple 
idea look difficult. You should regard a proof as an explanation why a statement 
is true. But an explanation depends on whom you are talking to: you would 
probably give one explanation to a beginning high school student, another to one 
of your classmates, and yet another to your professor. As a rule of thumb, your 
proofs should be directed toward your peers, one of whom is yourself. Make your 
proof as clear as possible, not too long, not too short. If your proof is challenged, 
you must be prepared to explain further, so try to anticipate challenges by giving 
enough details in your original proof. 

What have we shown? Formulas such as (— 1)(— a) = a hold, not because 
of the nature of the numbers a and 1 , not because of the particular definitions of 
the operations of addition and multiplication, but merely as consequences of the 
axioms for addition and multiplication stated in the definition of commutative 
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ring. For example, we shall see, in Proposition 3.6, that the binomial theorem 
holds in every commutative ring. Once we see that all functions R — » R form a 
commutative ring [Example 3.10], it will then follow that the binomial theorem 
(/ + g)" = ' holds for all functions f, g: R -> R. Thus, a theo- 

rem about commutative rings applies not only to numbers but to other realms as 
well, thereby proving many theorems all at once instead of one at a time. The 
abstract approach allows us to be more efficient; the same result need not be 
proved over and over again. There is a second advantage of abstraction. The 
things one adds and multiplies may be very complicated, but many properties 
may be consequences of the rules of manipulating them and not of their intrinsic 
structure. Thus, as we have seen when we studied groups, the abstract approach 
allows us to focus on the essential parts of a problem; we need not be distracted 
by any features irrelevant to it. 

Definition. If R is a commutative ring and a,b e R, then subtraction is defined 
by 

a — b = a + (— b). 


In light of Corollary 3.3, 


a — b = a + (— \)b. 

Here is one more ultrafussy proof (we shall not be so fussy again!): the 
distributive law ca — cb = c(a — b) holds for subtraction. 

a(b — c) = a[b + (— l)c] = ab + a[(— l)c] 

= ab + [o(— l)]c = ab + [(— \)a]c 
= ab + (— l)(oc) = ab — ac. 


It R is a commutative ring and r e R, it is natural to denote rr as r 2 and rrr 
as r . Similarly, it is natural to denote r + r as 2r and r + r + r as 3r. Here is 
the formal definition. 

Definition. Let R be a commutative ring, let a e R, and let n e IT. Define 
0a = 0 (the 0 on the left is the number zero, while the 0 on the right is the zero 
element of R), and define ( n + 1 )a = na + a. Define ( —n)a = — ( na ). 

Thus, if n e N, we have na = a + a H \-a, where there are n summands. 

It is easy to see that ( —n)a = —(na) = n(—a). The element n* = s + • • • + e, 
where s is the one in a commutative ring R, has the property that na, as defined 
above, is equal to n*a. Thus, na, the product of a natural number and a ring 
element, can also be viewed as the product of two ring elements. 
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Proposition 3.6 (Binomial Theorem). If a, b € R,, where R is a commutative 
ring, then for all n > 0, 


(a + b) n 



Proof Adapt the proof of Proposition 1.15, the binomial theorem in Z. In 
particular, define a 0 = 1 for every a e R, even for a = 0. • 

In the definition of commutative ring, we did not insist that I f 0. 


Proposition 3.7. If R is a commutative ring in which 1=0, then R has only 
one element'. R = {0}. One calls R the zero ring. 

Proof. If r e R, then r = lr = Or = 0, by Proposition 3.5. • 

The zero ring arises occasionally, but we agree that it is not very interesting. 


Definition. An integral domain is a commutative ring R with 1 y= 0 which 
satisfies an extra axiom, the cancellation law for multiplication: 

if ca = cb and c^0, then a = b. 

We will consistently abbreviate this term to domain (unless it occurs in a context 
in which it might be confused with the domain of some function). 

The familiar examples of commutative rings: Z, Q, M, C, are domains, but 
we shall soon exhibit honest examples of commutative rings that are not do- 
mains. 


Proposition 3.8. A commutative ring R is a domain if and only if it is not the 
zero ring and the product of any two nonzero elements of R is nonzero. 

Proof. Assume that R is a domain, so that the cancellation law holds. Suppose, 
by way of contradiction, that there are nonzero elements a,b e R with ab = 0. 
Proposition 3.5 gives 0 • b = 0, so that ab = 0 • b. The cancellation law now 
gives a = 0 (for b f 0), and this is a contradiction. 

Conversely, assume that the product of nonzero elements in R is always 
nonzero. If ca = cb with r/0, then 0 = ca — cb = c(a — b). Since c / 0, 
the hypothesis that the product of nonzero elements is nonzero forces a — b = 0. 
Therefore, a = b, as desired. • 
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Definition. A subset 5 of a commutative ring R is a subring of R if: 

(i) leS; 4 

(ii) if a, b e S, then a — b e S; 

(iii) if a,b e S, then ab e S. 

Just as a subgroup is a group in its own right, so is a subring of a commutative 
ring a commutative ring in its own right. 


Proposition 3.9. A subring S of a commutative ring R is itself a commutative 
ring. 

Proof By hypothesis, 1 e S and axiom (vii) in the definition of commutative 
ring on page 216 holds. We now show that S is closed under addition; that is, 
if s, s' e S, then s + s' £ S. Axiom (ii) in the definition of subring gives 
0 = 1 — 1 e S. Another application of this axiom shows that if b € S, then 
0 — b = —b e 5; finally, if a.b e S, then Lemma 3.3 shows that S contains 

a — (—b) = a + (— 1)(— b) 

= a + (— 1)(— l)fi 
= a + b. 

Thus, S is closed under addition and multiplication. It contains 1 and 0 and, for 
each v e S, it contains —s. All the other axioms in the definition of commuta- 
tive ring are inherited by S from their holding in the commutative ring R. For 
example, we know that the distributive law a(b + c) = ab + ac holds for all 
a,b,c e R. In particular, this equation holds for all a. b, c e S C R. and so the 
distributive law holds in 5. • 

To verify that a set S is a commutative ring requires checking ten items: clo- 
sure under addition and multiplication and eight axioms; to verify that a subset 
5 of a commutative ring is a subring requires checking only three items, which 
is obviously more economical. For example, it is simpler to show that the ring 
of Gaussian integers, 

7L [i] = {z e C : z = a + ib : a, b e Z}, 

is a subring of C than to verify all the axioms in the definition of a commutative 
ring. Of course, one must first have shown that C is a commutative ring. 

4 The even integers do not form a subring of Z because 1 is not even. Their special structure 
will be recognized when ideals are introduced. 
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Example 3.10. 

If n > 3 is an integer, let = e l7TI be a primitive nth root of unity, and define 

2 [f] = {z £ C : Z = do + a\t^ n + 02^/7 + ' ' ' + a n- lC« a H a i *= ^}- 

When n = 4, then Z [£ 4 ] is the Gaussian integers Z [/]. It is easy to check that 
Z [£„] is a subring of C; to prove that Z [f„] is closed under multiplication, note 
that if m > n, then m = qn + r, where 0 < r < n, and -4 

Here is an example of a commutative ring that is not a domain. 

Example 3.11. 


(i) Let JT(M) be the set of all the functions M — > M equipped with the 
operations of pointwise addition and pointwise multiplication: for func- 
tions /, g € JF(M), define new functions / + g and fg by 

f + g:a\-+f(a)+g(a) and fg: a h* f(a)g(a) 

(notice that fg is not their composite). 

Pointwise addition and pointwise multiplication are precisely those 
operations on functions that occur in calculus. For example, recall the 
product rule for derivatives: 

ifg)’ = f 8 + fg’- 

The + in the sum fg + fg ’ is pointwise addition, and f'g is the pointwise 
product of f and g. 

We claim that JT(ffi) with these operations is a commutative ring. Ver- 
ification of the axioms is left to the reader with the following hint: the zero 
in JF(M) is the constant function z with z(a) = 0 for all a € ffi, and the one 
is the constant function s with s(a ) = 1 for all set 




Figure 3.1 is not a Domain 
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We now show that JF(R) is not a domain. Define / and g by: 


f(a ) = 


a if a < 0 
0 if a > 0; 


g(a) = 


0 if a < 0 
a if a > 0. 


Clearly, neither / nor g is zero (i.e., f ^ z and g y= z). On the other 
hand, for each a e R, fg: a i->- f(a)g(a ) = 0, because at least one 
of the factors f(a) or g ( a ) is the number zero. Therefore, fg = z, by 
Proposition 2.2, and JT(R) is not a domain. 


(ii) Recall that a function /: R — >• R is differentiable if f (a) exists for all 
a e R Let X>(R) = {all differentiable functions/' : R — »• R} We claim 
that D(R) is a subring of JT(R). Now e lies in D(R), for s' = z. If 
/, 8 e X>(R), then / + g e X>(R), for (/ + gf = f + g' , while (fg)' 
exists, by the product rule. Therefore, D(R) is a subring of !F( R), and so 
£>(R) is a ring in its own right, by Proposition 3.9. A 


Proposition 3.12. 

(i) I m , the integers mod m, is a commutative ring. 

(ii) The commutative ring I m is a domain if and only ifm is a prime. 

Proof. 

(i) In Theorem 2.101, we proved that there is an addition defined on I m , namely, 
[a] + [b | = [a + b |. which satishes axioms (i) through (iv) in the dehnition of 
commutative ring ([a] is the congruence class [a] = {b e Z : b = a mod m}). In 
Theorem 2.103, we proved that there is a multiplication defined on I m , namely, 

[a] [b] = \ab\. which satishes axioms (v) through (vii). Only the distributive law 
needs checking. Since distributivity does hold in Z, we have 

[«]([*] + [c]) = [a][b + c] 

= [a(b + c)] 

= [ab + ac] 

= [ab] + [oc] 

= [a][b] + [a][c]. 

Therefore, I,„ is a commutative ring. 

(ii) If m is not a prime, then m = ab, where 0 < a, b < m. Now both [a] and 

[b] are not [0] in I„„ because m divides neither a nor b, but [ a | [ /; | = [m | = [0]. 
Thus, I,„ is not a domain. 

Conversely, suppose that m is prime. Since m > 2, we have [1] f [0]. If 
[a][b] = [0], then ab = 0 mod m, that is, m \ ab. Since m is a prime, Euclid’s 
lemma gives m \ a or m \ b: that is, a = 0 mod m or b = 0 mod m; that is, 
[a] = [0] or [ b ] = [0]. Therefore, I m is a domain. • 
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For example, I <5 is not a domain because [2] 7^ 0 and [3] 7^ 0, yet [2] [3] = 

[ 6 ] = [ 0 ], 

Many theorems of ordinary arithmetic, that is, properties of the commutative 
ring Z, hold in more generality. We now generalize some familiar definitions 
from Z to arbitrary commutative rings. 

Definition. Let a and b be elements of a commutative ring R. Then a divides 
b in R (or a is a divisor of b or b is a multiple of a), denoted 5 by 

a | b, 

if there exists an element c e R with b = ca. 

As an extreme example, if 0 | a, then a = 0 • b for some b e R. Since 
0 • b = 0, however, we must have a = 0. Thus, 0 | a if and only if a = 0. 

Notice that whether a \ b depends not only on the elements a and b but on the 
commutative ring R as well. For example, 3 does divide 2 in Q, for 2 = 3 x 
and | e Q; on the other hand, 3 does not divide 2 in Z, because there is no 
integer c with 3c = 2. 

The reader can quickly check each of the following facts. For every a e R, 
we have a \ a, 1 | a, —a \ a, — 1 | a, and a | 0. 

Lemma 3.13. Let R be a commutative ring, and let a, b, c be elements of R. 

(i) Ifa\b and b \ c, then a \ c. 

(ii) If a \ b and a \ c, then a divides every number of the form sb + tc, where 
s, t e R. 

Proof Exercises for the reader. • 

Definition. If R is a commutative ring and a, b e R. then a linear combination 
of them is an element of R of the form sa + tb, where s,t £ R. 

Thus, Lemma 3.13 says that any common divisor of elements a,b e R must 
also divide every linear combination of a and b. 

Definition. An element u in a commutative ring R is called a unit if u | 1 in 
R, that is, if there exists v e R with uv = 1; the element v is called the 6 inverse 
of u, and v is often denoted by u~ l . 

An element a e R is an associate of an element r e R if there is a unit 
u £ R with a = ur. 

5 Do not confuse the notations a \ b and a/b. The fi rst one denotes the statement “a is a 
divisor of b,” whereas the second one denotes an element c € R with be = a. 

6 Uniqueness of the inverse is Exercise 3.2 on page 226. 
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Example 3.14. 

The only units in Z are ±1, and the associates of n e Z are ±n. ◄ 

Units are of interest because one can always divide by them. If u is a unit in 
R, then there is v e R with uv = 1, and if a e R, then u \ a because 

a = u(va) 

is a factorization of a in R. Thus, it is reasonable to define the quotient a/u as 
a j u = va = u~ l a. (Recall that this last equation is the reason why zero is never 
a unit; that is, why dividing by zero is forbidden.) 

Just as divisibility depends on the commutative ring R, so does the question 
whether an element u e R is a unit depend on R (for it is a question whether u | 1 
in R). For example, the number 2 is a unit in Q, for | lies in Q and 2x^ = 1, 
but 2 is not a unit in Z, because there is no integer v with 2v = 1. 

The following theorem generalizes Exercise 1 .45 on page 5 1 . 

Proposition 3.15. Let R be a domain, and let a, b € R be nonzero. Then a \ b 
and b \ a if and only ifb = ua for some unit u £ R. 

Proof If a | b and b \ a , there are elements u, v £ R with b = ua and a = vb. 
Substituting, b = ua = uvb. Since b = lb and b f 0, the cancellation law in 
the domain R gives 1 = uv, and so u is a unit. 

Conversely, assume that b = ua, where u is a unit in R. Plainly, a \ b. If 
v e R satisfies uv = 1, then vb = vua = a, and so b \ a. • 

There exist examples of commutative rings R in which the conclusion of 
Proposition 3.15 is false, and so the hypothesis in this proposition that R be a 
domain is needed. 

What are the units in I,„? 

Proposition 3.16. If a is an integer, then [a] is a unit in I m if and only if a and 
m are relatively prime. In fact, if sa + tm = 1, then = [.?]. 

Proof. If [a | is a unit in I m , then there is [ 5 ] e I m with [s][a] = [1]. Therefore, 
sa = 1 mod m, and so there is an integer t with sa — I = tm - , hence, 1 = sa—tm. 
By Exercise 1.51 on page 52, a and m are relatively prime. 

Conversely, if a and m are relatively prime, there are integers .v and t with 
1 = sa + tm. Hence, sa — 1 = — tm and so sa = 1 mod m. Thus, [.v][u] = [1], 
and [a] is a unit in I m . • 

Corollary 3.17. If p is a prime, then every nonzero [a] in l p is a unit. 

Proof. If [a | f [0], then a f 0 mod p, and hence p \ a. Therefore, a and p 
are relatively prime because p is prime. • 
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Definition. If R is a commutative ring, then the group of units of R is 

U (R) = {all units in R). 

It is easy to check that U (R) is a multiplicative group; that is, it is closed 
under products and inverses. [We have already met U (I m ) in the proof of Theo- 
rem 2.107.] 

The introduction of the commutative ring I m makes the solution of congru- 
ence problems much more natural. A congruence ax = b mod m in Z becomes 
an equation [a][x] = [ b | in J m . If [a] is a unit in I m , that is, if (a, m ) = 1, 
then it has an inverse [a]~' = [.?], and we can divide by it; the solution is 
[x] = [a \ ~ 1 [ /; | = [.v | [ /; | = [sb |. In other words, congruences are solved just as 
ordinary linear equations otx = ft are solved over M; that is, x = a~ 1 /l . 


Exercises 


3.1 Prove that a commutative ring R has a unique one 1; that is, if e e R satisfies 
er = r for all r e R, then e = 1. 

*3.2 Let R be a commutative ring. 

(i) Prove the additive cancellation law. 

(ii) Prove that every a e R has a unique additive inverse: if a + b = 0 and 
a + c = 0, then b = c. 

(iii) If u e R is a unit, prove that its inverse is unique: if ab = 1 and ac = 1, 
then b = c. 


3.3 


(i) Prove that subtraction in Z is not an associative operation. 

(ii) Give an example of a commutative ring R in which subtraction is asso- 
ciative. 


3.4 Assume that S is a subset of a commutative ring R such that 


(i) 1 G 5; 

(ii) if a, b e S, then a + b e S; 

(iii) if a, b € S, then ab e S. 


(In contrast to the defi nition of subring, we are now assuming a + b e S instead of 
a — b G S.) Give an example of a commutative ring R containing such a subset S 
which is not a subring of R. 

3.5 Find the multiplicative inverses of the nonzero elements in I n. 

*3.6 (i) If A is a set, prove that the Boolean group B{X) in Example 2.47(ix) with 

elements the subsets of X and with addition given by 


U + V = (U - V) U (V - U) 


is a commutative ring if one defi nes multiplication 


uv = unv. 
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One calls B(X) a Boolean ring. 

(ii) Prove that B(X) contains exactly one unit. 

(iii) If 7 is a proper subset of X, show that the one in £>(7) is distinct from 
the one in B(X). Conclude that B{Y) is not a subring of B{X). 

(iv) Prove that every element in B(X) satisfi es U 2 = U. 

3.7 (i) If R is a domain and a e R satisfi es ct = a, prove that either a = 0 or 

a = 1. 

(ii) Show that the commutative ring JT(R) in Example 3.11 (i) contains ele- 
ments / ^ 0, 1 with f 2 = /. 

3.8 Find all the units in the commutative ring J-'flR) defi ned in Example 3.1 l(i). 

*3.9 Generalize the construction of .T 7 )®) to a set X and an arbitrary commutative 

ring R : let J~(X, R) be the set of all functions from X to R, with pointwise ad- 
dition f + g: x h> f(x) + g{x) and pointwise multiplication fg : x i-> f{x)g(x) 
for x G X. 

(i) Show that J-(X, R) is a commutative ring. 

(ii) Show that if X has at least two elements, then T{X. R) is not a domain. 

(iii) If R is a commutative ring, denote T(R. R) by J-(R)\ 

T(R) = {all functions R — > R}. 

Show that I 2 ) has exactly four elements, and that / + / = 0 for every 

/ 6 Tih). 

*3.10 (i) If R is a domain and S' is a subring of R. prove that S is a domain. 

(ii) Prove that C is a domain. 

(iii) Prove that Z, Q, and ® are domains. 

(iv) Prove that the ring of Gaussian integers is a domain. 

*3.11 Prove that the intersection of any family of subrings of a commutative ring R is a 
subring of R. 

3.12 Prove that the only subring of Z is Z itself. 

3.13 Let a and m be relatively prime integers. Prove that if sa + tm = 1 = s'a + t'm, 
then s = s' mod m. See Exercise 1.51 on page 52. 

3.14 (i) Is R = [a + b\fl : a, b e Z} a domain? 

(ii) Is R = {^(n + bs/ 2) : a. b € Z} a domain? 

(iii) Using the fact that a = ^ ( 1 + v 7 — 19) is a root of x 2 — x + 5, prove that 
R = [a + ba : a, b G Z} is a domain. 

3.15 Prove that the set of all C°°-functions is a subring of ^(M). (See Exercise 1.37 on 
page 34.) 


3.2 Fields 

There is an obvious difference between Q and Z : every nonzero element of Q is 
a unit. 
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Definition. A field 1 F is a commutative ring with 1 f=- 0 in which every non- 
zero element a is a unit; that is, there is a^ 1 e F with a~ l a = 1. 

The first examples of fields are Q, M, and C. 

The definition of field can be restated in terms of the group of units; a com- 
mutative ring R is a field if and only if U (R) is the set R x of nonzero elements 
in R. To say this another way, R is a field if and only if R y is a multiplicative 
group. 

Proposition 3.18. Every field F is a domain. 

Proof. Assume that ab = ac, where a f 0. Multiplying both sides by a -1 
gives a~ x ab = a~ l ac, and so b = c. • 

Of course, the converse of this proposition is false, for Z is a domain that is 
not a field. 

Proposition 3.19. The commutative ring I,„ is afield if and only ifm is prime. 

Proof. If m is prime, then Corollary 3.17 shows that I,„ is a field. 

Conversely, if m is composite, then Proposition 3.12 shows that I„, is not a 
domain. By Proposition 3.18, I m is not a field. • 

Notation. When p is a prime, we will usually denote the field I p by 

F p . 

At the end of this chapter (see Theorem 3. 124), we shall prove that there are 
finite fields other than F p for p prime (a field with four elements is constructed 
in Exercise 3.17 on page 232). 

When I was a graduate student, one of my fellow students was hired to tutor 
a mathematically gifted 10-year-old boy. To illustrate how gifted the boy was, 
the tutor described the session in which he introduced 2x2 matrices and matrix 
multiplication to the boy. The boy’s eyes lit up when he was shown multiplica- 
tion by the identity matrix, and he immediately went off in a comer by himself. 
In a few minutes, he told his tutor that a matrix [ a ] has a multiplicative inverse 
if and only if ad — be f=- 0! 

7 The derivation of the mathematical usage of the English term fi eld (fi rst used by E. H. 
Moore in 1893 in his article classifying the fi nite fi elds) as well as the German term Korper and 
the French term corps is probably similar to the derivation of the words group and ring: each 
word denotes a ‘realm,” a ‘body” of things, or a ‘bollection of things.” The word domain ab- 
breviates the usual English translation integral domain of the German word Integretdtsbereich , 
a collection of things analogous to integers. 
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In another session, the boy was shown the definition of a field. He was quite 
content as the familiar examples of the rationals, reals, and complex numbers 
were displayed. But when he was shown a field with 2 elements, he became very 
agitated. After carefully checking that every axiom really does hold, he exploded 
in a rage. I tell this story to illustrate how truly surprising and unexpected are the 
finite fields. 

In Chapter 2 we introduced GL(2, R), the group of nonsingular matrices 
with entries in R. Afterward, we observed that replacing R by Q or by C also 
gives a group. We now observe that R can be replaced by any field k: GL(2, k) 
is a group for every field k. In particular, GL(2, F p ) is a finite group for every 
prime p. 

It was shown in Exercise 3.10 on page 227 that every subring of a domain is 
itself a domain. Since fields are domains, it follows that every subring of a field 
is a domain. The converse of this exercise is true, and it is much more interesting: 
every domain is a subring of a field. In order to prove this result, we recall that 
an equivalence relation on a set X is a (binary) relation x = y, where x,y e X, 
which is reflexive, symmetric, and transitive. 

Given four elements a, b,c, and d in a field F with b ^ 0 and d 0, assume 
that ab~ l = cd~ l . Multiply both sides by bd to obtain ad = be. In other words, 
were ah" 1 written as a/b, then we have just shown that a/b = c/d implies 
ad = be: that is, “cross-multiplication” is valid. Conversely, if ad = be and 
both b and d are nonzero, then multiplication by b~ l d~ l gives ab~ l = cd ~ 1 , 
that is, a/b = c/d. 

The proof of the next theorem is a straightforward generalization of the stan- 
dard construction of the field of rational numbers Q from the domain of inte- 
gers Z. 

Lemma 3.20. If R is a domain and X = {( a.b ) e R x R : b 0}, then the 
relation = on X, defined by cross-multiplication: 

(i a , b) = (c, d) if ad = be, 
is an equivalence relation. 

Proof. Verifications of reflexivity and of symmetry are easy. For transitivity, 
assume that (a, b) = (c, d) and (c, d) = (e, /). Now ad = be gives adf = 
bef, and cf = de gives bef = bde\ thus, adf = bde. Since R is a domain, we 
may cancel the nonzero d to get af = be: that is, (a. b) = (e, /). • 

Theorem 3.21. If R is a domain, then there is a field F containing R as a 
subring. Moreover, F can be chosen so that, each f e F has a factorization 
f = ab~ l with a, b £ R and b 7 ^ 0. 
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Proof. Example 2.17(iii) shows that the relation = on {(a, b) e Z x Z : b 7^ 0}, 
defined by {a, b) = (c, d ) if ad = be, is an equivalence relation. It is easy to 
check that that argument generalizes to show that the relation on X = {(a,b) e 
R x R : b 7^ 0}, defined in the same way, is an equivalence relation on X. 

Denote the equivalence class of ( a , b) by [a, b\, and define F = X, the 
set of all equivalence classes [a,b]. Equip F with the following addition and 
multiplication (if we pretend that [a. b ] is the fraction a/b, then these are just the 
usual formulas): 

[a,b] + [c, d ] = [ad + be, bd] 

and 

[a, b][c, d ] = [ ac , bd]. 

Notice that the symbols on the right make sense, for b f 0 and d 0 imply 
bd 0 because R is a domain. The proof that F is a held is now a series of 
routine steps. 

Addition F x F — > F is well-defined: if [a,b] = [a',b'] and [c, d] = 
[c ' , d'], then [ad + be, bd] = [a'd' + b'c', b'd']. We are told that ab' = a'b and 
cd' = c'd. Hence, 

(i ad + bc)b'd' = adb'd' + beb'd' = ( ab')dd ' + bb' (cd') 

= a'bdd' + bb'c'd = (a'd' + b'c')bd ; 

that is, (ad + be, bd) = (a'd' + b'c ' , b'd'), as desired. A similar computation 
shows that multiplication F x F — > F is well-defined. 

The verification that F is a commutative ring is also routine, and it is left to 
the reader, with the remark that the zero element is [0, 1], the one is [1, 1], and 
the negative of [a. b] is [—a, b\. If we identify a e R with [a, 1] e F, then it is 
easy to see that the family R' of all such elements is a subring of F: 

[1, 1] € R'\ 

[a, 1] - [c, 1] = [a, 1] + [-c. 1] = [a - c, 1] € R'; 

[a, 1 ][c, 1] = [ac, 1] € R'. 

To see that F is a held, observe hrst that if \a. h\ f 0, then a f 0 (for 
the zero element of F is [0. 1] = [0, b |). The inverse of [a,b] is [b, a], for 
[o, b][b, a] = [ab, ab] = [1, 1]. 

Finally, if b 7^ 0, then [1 ,b] = [b, l]^ 1 (as we have just seen). Therefore, if 
[o, /;] € F, then [a, b] = [a, 1][1, /;] = [a. 1 ][b, l] -1 . This completes the proof, 
for [a, 1] and [b, 1] are in R' . • 

The statement of Theorem 3.21 is not quite accurate; the held F does not 
contain R as a subring, for R is not even a subset of F. Instead, we proved 
that F does contain a subring, namely, R' = {[a, 1] : a e R], which strongly 
resembles R. We shall make the identiheation of R and R' precise once the 
notion of isomorphism is introduced (see Example 3.31). 
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Definition. The field F just constructed from a domain R in Theorem 3.21 is 
called the fraction field of 1?; we denote it by 

Frac(F), 

and we denote the element [a, b | € Frac(F) by a/b. In particular, the elements 
[a, 1] of R' are denoted by a/1 or, more simply, by a. 

Notice that the fraction field of Z is Q; that is, Frac(Z) = Q. 

Definition. A subfield of a field K is a subring k of K which is also a field. 

Proposition 3.22. 

(i) A subset k of a field K is a subfield if and only if it is a subring that is 
closed under inverses', that is, if a € k and a 0, then a~ l € k. 

(ii) If[F{ : i 6 / } is any (possibly infinite) family of subfields of a field K, then 
k = n ie iFi is a subfield of K. 

Proof. 

(i) If a subset k of a field K is a subfield, then it obviously contains inverses of its 
nonzero elements. Conversely, if k is a subring that contains inverses of nonzero 
elements, then it is a field, and hence it is a subfield of K. 

(ii) We use part (i). Since any intersection of subrings is itself a subring, by 
Exercise 3.11 on page 227, k is a subring of K. If a e k is nonzero, then it 
got into k by being in every F;. But since F; is a subfield, a -1 e F,, and so 
a~ l e fj- Fj = k. Therefore, A: is a subfield of K. • 

Definition. If K is a field, the intersection k of all the subfields of K is called 
the prime field of k. 

Of course, every field has a unique prime field. In Proposition 3.1 1 1, we will 
see that every prime field is essentially Q or F ;) for some prime p. 


Exercises 

3.16 (i) If R is a commutative ring, defi ne the circle operation a o b by 

aob = a + b — ab. 

Prove that the circle operation is associative and that 0 o a = a for all 
a G R. 

(ii) Prove that a commutative ring R is a fi eld if and only if the set 

R # = {,- G R : r ^ 1} 

is an abelian group under the circle operation. 
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* 3.17 


3.18 

* 3.19 

3.20 

* 3.21 


* 3.22 


3.23 


* 3.24 


3.25 


* 3.26 


( R . A. Dean) Defi ne Ej to be the set of all 2 x 2 matrices 


F4 


b 

a + b 


: a, b G Fj 


(i) Prove that F 4 is a commutative ring whose operations are matrix addition 
and matrix multiplication. 

(ii) Prove that F 4 is a fi eld having exactly 4 elements. 

(iii) Show that 1 4 is not a fi eld. 

Prove that every domain R with a fi nite number of elements must be a fi eld. Using 
Proposition 3.12, this gives a new proof of suffi ciency in Proposition 3.19. 

Find all the units in the ring Z [/] of Gaussian integers. 

Show that F = {a + bjl : a, b e Q} is a fi eld. 

(i) Show that F = {a + bi : a, b e Q} is a fi eld. 

(ii) Show that every u e F has a factorization u = a/} -1 , where a, fi e Z [/]. 
(see Exercise 3.49(h) on page 249.) 

If R is a commutative ring, defi ne a relation = on R by a = b if there is a unit 
u € R with b = ua. 

(i) Prove that = is an equivalence relation. 

(ii) If a = b, prove that (a) = (b), where (a) = { ra : r e R}. Conversely, 
prove that if R is a domain, then (a) = ( b ) implies a = b. 

If R is a domain, prove that there is no subfi eld K of Frac(R) such that 


fiCHC Frac(R). 

Let k be a fi eld with one e, and let R be the subring 

R = {ne : n e Z }. 

(i) If F is a subfi eld of k, prove that R C F . 

(ii) Prove that a subfi eld F of k is the prime fi eld of k if and only if it is the 
smallest subfi eld of k containing R\ that is, there is no subfi eld F with 
SCF'CF. 

(iii) If R is a subfi eld of k. prove that R is the prime fi eld of k. 

(i) Show that every subfi eld of C contains Q. 

(ii) Show that the prime fi eld of K is Q. 

(iii) Show that the prime fi eld of C is Q. 

(i) For any fi eld F, prove that E(2, F) = Aff(l, F), where E(2, F) denotes 
the stochastic group (defi ned in Exercise 2.42 on page 144). 

(ii) If F is afinite field with q elements, prove that |£(2, F)| = q(q — 1). 

(iii) Prove that E (2, F 3 ) = S 3 . 


3.3 Polynomials 

Even though the reader is familiar with polynomials, we now introduce them 
carefully. One modest consequence is that the mystery surrounding the “un- 
known” x will vanish. 
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Informally, a polynomial is an “expression” so + s\x + S 2 X 2 + • • • + s n x n . 
The key observation is that one should pay attention to where the coefficients of 
polynomials live. 

Definition. If R is a commutative ring, then a sequence 8 in R is a function 
cr : R. 

Informally, the expression sq + s | x + sjx 2 + • • • + s n x n corresponds to the 

sequence (so, si, S 2 , , s n , 0, 0, . . .) of its coefficients. 

As any function, a sequence a is determined by its values; for each i e N, 
write cr (i ) = Si e R, so that 

a = (50,51,52, . ..,5j ...). 

The entries Sj e R are called the coefficients of the sequence. The term coef- 
ficient means “acting together to some single end.” Here, coefficients combine 
with powers of x to give the terms of a sequence. 

By Proposition 2.2, two sequences a and r in R are equal if and only if 
a (i) = x (i ) for all i > 0; that is, cr = r if and only if they have the same 
coefficients. 

Definition. A sequence cr = (50, 5 i , . . . , 5; , . . . ) in a commutative ring R is 
called a polynomial if there is some integer n > 0 with ,v, = 0 for all i > n: that 

is, 

a = (s 0 , 5i, ... , s n , 0, 0 ). 

A polynomial has only finitely many nonzero coefficients. 

The sequence cr = (0, 0, 0, ...) is a polynomial, called the zero polynomial ; 
it is denoted by cr = 0. 

Definition. If cr 7^ 0 is a polynomial, then there is a natural number n with 
s„ 0 and s, = 0 for all i > n. One calls s„ the leading coefficient of cr, one 
calls n the degree 9 of cr, and one denotes it by deg(cr). 

The zero polynomial 0 does not have a degree because it has no nonzero 
coefficients; every other polynomial does have a degree. 

Notation. If R is a commutative ring, then the set of all polynomials with 
coefficients in R is denoted by R [a | . 

^Sequences in R are also called formal power series (see Exercise 3.36 on page 240). 

9 The word degree comes from the Latin word meaning ‘ktep.” 



234 


Commutative Rings I Ch. 3 


We will soon prove that a polynomial (so, si, • . . , s n , 0, 0, . . .) of degree n 
can be written as so + si* + s 2 x 2 + • • • + s„x n , but, until then, we proceed 
formally. Equip ft [a | with the following operations. Define 

a + r = (sq + fi)> si + h> ■ • • > s i + ti , . . . ) 


and 

ox = (ao, a\, . . . , a/c, . . .), 

where a k = J2i+j=k s i t j = Y.t=o s i t k-'d thus, 

OX = (S 0 t0, S 0 h + S| t(), Sof 2 + Vi + S 2 fo, . . . ). 

We will soon prove that ft[x] is a commutative ring. The next proposition 
shows where the formula for multiplication comes from. 

Proposition 3.23. If ft is a commutative ring and r, Si, tj e ft for i > 0 and 
j > 0, then 

(so T sir + • • • )(f 0 + fir + • • • ) = ao + a\r + • • • + a k r^ + • • • , 
where a k = JT +/ - =jt Sit j for all k > 0. 

Remark. This proof should be an induction on k > 0, but we give an informal 
proof instead. ◄ 

Proof Write s,r ! = /(r) and ; t jr ' 1 = g(r). Then 

f (r )g(r ) = (s 0 + sir + s 2 r 2 H )g(r) 

= s 0 g(r) + sjrg(r) + s 2 r 2 g(r) H 

= s 0 (to + fir H ) + sir(f 0 + t\r 4 ) 

+ s 2 r 2 (f 0 4- fir 4 ) 4 

= s 0 fo + (sifo + s 0 fi)r + (s 2 f 0 + sifi 4- s 0 f 2 )r 2 + 

(s 0 f3 + sif 2 + s 2 f i + s 3 f 0 )r 3 4 ■ • 

Lemma 3.24. Let ft be a commutative ring and let a, re ft[x] be nonzero 
polynomials. 

(i) Either ax = 0 or deg(ax) < deg(cr) 4- deg(r). 

(ii) If ft is a domain, then ox 0 and 


deg(crr) = deg(cr) + deg(r). 
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Proof. 

(i) Let a = (so, si, . . . ) have degree m, let r = (to. t \, . . . ) have degree n, and 
let err = (ao, a\, . . . ). It suffices to prove that a ^ = 0 for all k > m + n. By 
definition, 

ajc = ^ ' Sjt j . 

i+j=k 

If i < m, then j = k — i > k — m > n (because k > m + n), and so tj = 0 
(because r has degree n); if i > m . then s, = 0 because a has degree m. In 
either case, each term s L t j = 0, and so au = s i tj = 0. 

(ii) Now let k = m + n. With the possible exception of s m t n (the product of the 
leading coefficients of a and r), the same calculation as in part (i) shows that 
each term ,v, tj in 


a m+n — \ ' ' ' t Sn—lhn+\ \ Snhn ^n + 1 kn- I i • • • ~T~ Sm+ntO 

is 0. If i < m, then m — i >0, hence j = m — i + n > n, and so tj = 0; if 
i > in, then ,sj = 0. Hence 

a m+n = Smtn- 

Since R is a domain, s m f 0 and t n f 0 imply s m t n f 0; hence, at / 0 and 
deg(crr) = m + n = deg(cr) + deg(r). • 

Proposition 3.25. 

(i) If R is a commutative ring, then /^ [ a | is a commutative ring that contains 
R as a subring. 

(ii) If R is a domain, then A 1 [ a | is a domain. 

Proof. 

(i) Addition and multiplication are operations on /?[*]: the sum of two polyno- 
mials a and r is a sequence which is also a polynomial (indeed, either a + r = 0 
or deg(or + r) < max{deg(cr), deg(r)}), while the lemma shows that the se- 
quence which is the product of two polynomials is a polynomial as well. Verifi- 
cations of the axioms for a commutative ring are again routine, and they are left 
to the reader. Note that the zero is the zero polynomial, the one is the polynomial 
(1, 0, 0, . . .), and the negative of (so, s \, . . . , s;, . . .) is (—so, — si, . . . , — s;, . . .). 
The only possible problem is proving associativity of multiplication; we give 
the hint that if p = (ro, r\, . . . , r;, . . . ), then the fth coordinate of the polyno- 
mial p(crr) turns out to be / f / {=(: r i (s/tk), while the Ith coordinate of the 
polynomial (per) r turns out to be ^2i+j+k=i( r i s j^kl these are equal because of 
associativity of the multiplication in R. 

It is easy to check that R' = {(r, 0, 0, . . .) : r e R} is a subring of /?[*], 
and we identify R' with R by identifying r e R with (r, 0, 0, . . . ). 
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(ii) If R is a domain and if a and r are nonzero polynomials, then Lemma 3.24 
shows that err /0. Therefore, R[x | is a domain. • 

Just as our assertion (in Theorem 3.21) that a domain is a subring of its frac- 
tion field was not quite true, so, too, our assertion here that a commutative ring 
R is a subring of is not quite correct. There is a subring of R[x\, namely 
R' = j (r. 0, 0, . . .) : r e R}, which strongly resembles R, and the statement of 
Proposition 3.25 will be made precise once the notion of isomorphism is intro- 
duced (see Example 3.31). 

We can now recapture the usual notation. 

Definition. Define the indeterminate to be the element 
v = (0, 1,0,0, ...) € R[x]. 

Even though x is neither “the unknown” nor a variable, we call it the inde- 
terminate to recall one’s first encounter with it in high school (see the discussion 
on page 238). However, the indeterminate x is a specific element in the ring 
/?[*], namely, the polynomial (to, h, ti , . . .) with t\ = 1 and all other t*, = 0. 
One reason we insist that commutative rings have ones is to enable us to make 
this definition; if the set E of even integers were a commutative ring, then £[x] 
would not contain x (it would contain lx, however). Note that if R is the zero 
ring, then /?[*] is also the zero ring. 

Lemma 3.26. 

(i) If a = (so, s\, . . . , Sj, . . then 

xcr = (0, s 0 , si, ... , Sj, ...); 

that is, multiplying by x shifts each coefficient one step to the right. 

(ii) Ifn > 1, then x n is the polynomial having 0 everywhere except for 1 in the 
nth coordinate. 

(iii) Ifr € R, then 

(r, 0, 0 )(s 0 , si ,Sj,...)= (rs 0 , rs\ rsj, ...). 

Proof. 

(i) Write x = {to, t \, . . . , fi, . . . ), where t\ = 1 and all other ti = 0, and let 
xo = (a o, a\, . . . , au, . . .)■ Now ao = toso = 0 because to = 0. If k > 1, then 
the only nonzero term in the sum a* = Y2i+j=k s i O' ' s s k-lti = s/ ( _ i , because 

t\ = 1 and ti = 0 for i ^ 1; thus, for k > 1, the kth coordinate a/ c of xo is s* i, 

and xo = (0, so, si, . . . , s \, . . . ). 

(ii) An easy induction, using (i). 

(iii) This follows easily from the definition of multiplication. • 
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If we identify (r, 0, 0, ... ) with r, then Lemma 3.26(iii) reads 
r(so, si, . . . ,Si , . . .) = (rso, rsi , .... rs { , ...). 

We can now recapture the usual notation. 

Proposition 3.27. If a = (so, s\, . . . , s n , 0, 0, . . .), then 
a = so + s\x + S 2 X 2 H b s„x n , 

where each element s e R is identified with the polynomial (s, 0, 0, . . .). 

Proof. 

a = (s 0 , si, . . . , s„, 0, 0 ) 

= (so, 0,0, . . . ) + (0, si,0, ...) + ••• + (0,0,..., s n ,0, ...) 

= so(l , 0, 0, . . . ) + si (0, 1 , 0, . . . ) + • • • + s„ (0, 0 1,0,...) 

2 n 

= So + Six + S 2 X~ + • • • + S n X . • 

We shall use this familiar (and standard) notation from now on. As is cus- 
tomary, we shall write 

fix) = SO + S\X + S 2 X 2 ~\ b S„X n 


instead of a = (so, si, . . . , s n , 0, 0, . . .). 

Definition. If R is a commutative ring, then 7?[.v] is called the ring of polyno- 
mials over R. 

Here is some standard vocabulary associated with polynomials. If / (x) = 
so + s\x + six 2 + • • • + s n x" , where s„ f 0, then so is called its constant term 
and, as we have already said, s n is called its leading coefficient. If its leading 
coefficient s„ = 1, then f(x) is called monic. Every polynomial other than the 
zero polynomial 0 (having all coefficients 0) has a degree. A constant polyno- 
mial is either the zero polynomial or a polynomial of degree 0. Polynomials of 
degree 1 , namely, a + bx with b ^ 0, are called linear, polynomials of degree 2 
are quadratic , 10 degree 3’s are cubic, then quartic, quintic, etc. 

'^Quadratic polynomials are so called because the particular quadratic x 2 gives the area of 
a square ( quadratic comes from the Latin word meaning ‘four,” which is to remind one of the 
4-sided fi gure); similarly, cubic polynomials are so called because x 5 gives the volume of a 
cube. Linear polynomials are so called because the graph of a linear polynomial in R[x] is a 
line. 
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Corollary 3.28. Polynomials /(x) = so + s\x + s^x 2 + • • • + s n x n and 
g(x) = to + t\x + ? 2 X 2 H — • + t m x m are equal if and only ifsj = tj far all i e N 

Proof. We have merely restated the definition of equality of polynomials in 
terms of the familiar notation. • 

We can now describe the usual role of the indeterminate x as a variable. If R 
is a commutative ring, each polynomial /(x) = .?o + six + six 2 + • • • + s n x n e 
F[x] defines a polynomial function f b : R — » R by evaluation: if r e R. define 
/ b (r) = sq + s\r + S 2 r 2 + • • • + s n r n e R [usually, one is not so fussy, and 
one writes f(r) instead of f‘(r) |. The reader should realize that polynomials 
and polynomial functions are distinct objects. For example, if R is a finite ring, 
e.g., I m , then there are only finitely many functions from R to itself; a fortiori, 
there are only finitely many polynomial functions. On the other hand, if R is not 
the zero ring, there are infinitely many polynomials. For example, all the powers 
1, x, x 2 , . . . , x", . . . are distinct, by Corollary 3.28. 

Definition. Let F be a field. The fraction field of F[x |, denoted by Fix), is 
called the field of rational functions over F. 

Proposition 3.29. The elements of F(x) have the form f (x) / g(x), where fix), 
g(x) £ F[x] and g(x) 0. 

Proof By Theorem 3.21, every element in the fraction field F (x ) has the form 
/(x)g(x)" 1 . • 

Proposition 3.30. If p is a prime, then the field of rational functions F p (x) is 
an infinite field whose prime field is W p . 

Proof By Proposition 3.25, F /; [x] is a domain. Its fraction field F ;) (x) is a field 
containing F ; , [x] as a subring, while F p [x] contains F ; , as a subring, by Propo- 
sition 3.25. That ¥ p is the prime field follows from Exercise 3.24 on page 232. 

• 

In spite of the difference between polynomials and polynomial functions 
(we shall see, in Corollary 3.52, that these objects coincide when the coefficient 
ring R is an infinite field), one often calls R\x \ the ring of all polynomials over 
R in one variable (or polynomials over R in one indeterminate). If we write 
A = R[x ], then the polynomial ring A [y | is called the ring of all polynomials 
over R in two variables x and y (or indeterminate s), and it is denoted by R \ x . y]. 
For example, the quadratic polynomial ax 2 + bxy + cy 2 + dx + ey + / can be 
written cy 2 + (fix + e)y + (ox 2 + dx + /), a polynomial in y with coefficients 
in R[x \. By induction, one can form the commutative ring R\x \ . xj, .... x„ ] 
of all polynomials in n variables (or indeterminate s) with coefficients in R. 
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Proposition 3.25 can now be generalized, by induction on n, to say that if R 
is a domain, then so is R[x \ , X2, ■ ■ ■ , x n ]. Moreover, when F is a field, we 
can describe Frac(F[*i, X2, x n ]), denoted by F(x \,X2 , . . . , x n ), as all ra- 
tional functions in n variables (or indeterminates); its elements have the form 
fix 1, x 2 , . . .,x„)/g(x 1, x 2 , . . .,x„), where / andg lie in F[x i,X2, . . . , x n \ 


Exercises 

3.27 Show that if R is a nonzero commutative ring, then 7? [_v ] is never a fi eld. 

*3.28 Let £ be a fi eld and let A be an n x n matrix with entries in k. If fix ) = qj + 
cix + • • • + c m x m g k\x |, defi ne 

f ( A ) = col + ci A + • • • + c m A " 1 . 

(i) Prove that &[A], defi ned by £[A] = {/(A) : fix) G £[.«]}, is a commu- 
tative ring under matrix addition and matrix multiplication. 

(ii) If f{x) = pix)q(x) G k\x\ and if A is an n x n matrix over k , prove that 
/(A) = piA)q(A). 

(iii) Give examples of n x n matrices A and B such that k[A\ is a domain and 
£[B] is not a domain. 

*3.29 (i) Let R be a domain. Prove that if a polynomial fix) G F[x] is a unit, 

then fix) is a nonzero constant (the converse is tme if R is a fi eld). 

(ii) Show that i[2]x + [ 1] ) 2 = [1] in lf[x]. Conclude that the statement in 
part (i) may be false for commutative rings that are not domains. 

*3.30 Show that if fix) = x p — x G F p [jc], then its polynomial function / b : F p —*■ F ;7 
is identically zero. 

3.31 (i) If p is a prime and in, n G N, prove that = ('") mod p. 

(ii) Prove that ( p ™) = (" ? ) nrod p for all r > 0. 

(iii) Give another proof of Exercise 1.66: if p is a prime not dividing an inte- 
ger m > 1, then p \ ( p p 7)- 

*3.32 Let a G C, and let Z [cr] be the smallest subring of C containing a; that is, Z[a] = 
p| S, where S ranges over all those subrings of C containing a. Prove that 

Z\a\ = {/(a) : fix) G Z[x]}. 

3.33 If R is a commutative ring and fix) = f2i = o a i x ' e F[x] has degree n > 1, defi ne 
its derivative fix) G R\x | by 

f'ix) = a\ + 2 a 2 X + 3a?,x 2 + • • • + na„x n ~ l ; 

if fix) is a constant polynomial, defi ne its derivative to be the zero polynomial. 
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Prove that the usual rules of calculus hold for this defi nition of derivative; that is, 

(. / + *)' = /' + * 

(rfY = rf' if r e R; 

(fgY = fg' + f'g\ 

(f n Y =nf n ~ 1 / for all n > 1. 

*3.34 Assume that (x — a) \ f(x) in /? |a| . Prove that (x — a) 2 \ f(x) if and only if 
(x — a) | /' in R[x). 

3.35 (i) If f(x) = ax 2p + bx p + c e ¥ p [x\, prove that f'(x) = 0. 

(ii) State and prove a necessary and suffi cient condition that a polynomial 
fix) € F ; , |v| have f(x) = 0. 

*3.36 If R is a commutative ring, defi ne R[[.v]], the ring of formal power series over R, 
as the set of all sequences in R. 

(i) Show that the formulas defi ning addition and multiplication on R[x] 
make sense for R[[.v]], and prove that R[[.v]] is a commutative ring under 
these operations. 

(ii) Prove that R\x ] is a subring of /?[[*]]. 

(iii) Denote a formal power series a = (so, s\, S 2 , • ■ • , s n , . . .) by 

9 

CT = SQ + S\X + SlX" + • • • . 

Prove that if a = 1 + x + x 2 + • • • , then a = 1 /(I — x) is in R[[.v]]. 
*3.37 If a = (sq, jj, 52 . ■■■, s „, ...) is a nonzero formal power series, defi ne ord(cr) = 
m, where m is the smallest natural number for which s m ^ 0. Note that a ^ 0 if 
and only if it has an order. 

(i) Prove that if R is a domain, then R[[x]] is a domain. 

(ii) Prove that if k is a fi eld, then a nonzero formal power series a e &[[x]] is 
a unit if and only if ord(cr) = 0; that is, if its constant term is nonzero. 

(iii) Prove that if a e f | | x 1 1 and ord(cr) = n, then 

a = x n u. 


where u is a unit in £[[x]]. 


3.4 Homomorphisms 

Just as one can use homomorphisms to compare groups, so one can use homo- 
morphisms to compare commutative rings. 

Definition. If A and R are commutative rings, a (ring) homomorphism is a 
function / : A — > R such that 

(i) /(D= 1; 



(ii) f(a + a') = f{a) + f(a') for all a, a’ e A; 

(iii) f(aa') = f(a)f(a') for all a, a ' e A. 
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A homomorphism that is also a bijection is called an isomorphism. Com- 
mutative rings A and R are called isomorphic, denoted by A = R, if there is an 
isomorphism / : A — » R. 


Example 3.31. 


(i) Let R be a domain and let F = Frac (R) denote its fraction held. In The- 
orem 3.21 we said that R is a subring of F . but that is not the truth; R 
is not even a subset of F . We did find a subring R' of F , however, that 
has a very strong resemblance to R, namely, R ' = {[a, 1] : a € R] C F. 
The function f : R — > R', given by f{a) = [a, 1], is easily seen to be an 
isomorphism. 

(ii) We implied that a commutative ring R is a subring of f?[x] when we “iden- 

tified” an element r e R with the constant polynomial (r, 0,0,...) [see 
Lemma 3.26(iii)]. The subset R' = {(r, 0, 0, . . .) : r e R} is a subring 
of /?[*], and it is easy to see that the function /: R — »■ R ' , dehned by 
f(r) = (r, 0,0 ), is an isomorphism. ◄ 


Example 3.32. 


(i) Complex conjugation z = a + ib i->- a — ib is a homomorphism C — > C 
because 1 = 1, z + w =z + w, and zw = zw. Conjugation is an isomor- 
phism because it is its own inverse: for all z, we have z = z. 

(ii) Here is an example of a homomorphism of rings that is not an isomor- 
phism. Choose m > 2 and define / : Z — > I,„ by f(n ) = [»]. Notice that 
/ is surjective but not injective. 

(iii) The preceding example can be generalized. If R is a commutative ring with 
its one denoted by s, then the function / : Z — > R, defined by / (n) = ns, 
is a ring homomorphism. A 


Proposition 3.33. Let R and S be commutative rings, and let tp: R — »■ S be 
a homomorphism. If s\, . . . ,s n € S, then there exists a unique homomorphism 
<p : /?[*!, . . . , x n ] — >■ S with <p(xj) = s, for all i and <p(r) = (p(r)for all r e R. 
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Proof. The proof is by induction on n > 1. If n = 1, denote x\ by a and .v i by 
s. Define <p\ /?[x] — > S as follows: if f(x) = r,x', then 

<p: ro + r\x H b r n x n i-> cp(r 0 ) + <p{r\)s H b <p(r n )s n = <p(f ). 

By Corollary 3.28, a polynomial f(x) determines its sequence of coefficients, 
and so the function cp is well-defined; moreover, the formula shows that <p{x) = s 
and (p(r ) = <p(r) for all r e R. 

It remains to prove that (p is a homomorphism. First, <p( 1) = <p( 1) = 1 
because ip is a homomorphism. Second, if g(x) = «o + a\x + • • • + a m x m , then 

<p(f + g) = + a i) x ') 

i 

= X^' + ai)s l 

i 

= X^(c) + (p(ai))s‘ 
i 

= ^ ~2(p(ri)s l +^2(p(ai)s‘ 
i i 

= (p(f) + cp(g). 

Third, let f(x)g(x) = Jf k c k x k , where c k = J2 i+j=k r\aj. Then 

<P(fg) = £(X C ***) 

k 

= ^2<P(Ck)s k 
k 

= X?( X na j) sk 

k i+j=k 

= X( X <P( r i)<p(aj))s k . 
k i+j=k 

On the other hand, 


<p(f)<p(g) = (X^^XX^^) 

i j 

= X( X ( P(r i ) ( P(aj))s k . 

k i+j=k 

We let the reader show uniqueness of <p by proving, by induction on n > 0, 
that when 0 : R[x \ — »■ 5 is a homomorphism with 9(x) = s and 9(r) = (p(r) for 
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all r € R, then 


0(ro + r x -\ h r n x n ) = cp(r 0 ) + <p(n)s H h <p(r n )s n . 

The routine proof of the inductive step is left to the reader. • 


Definition. If R is a commutative ring and s e R, then evaluation at s is the 
function e s : R[x ] -»■ R, defined by e s (f (x)) = f(s) m , that is, e s (J2i r i x ') = 
Hi r i s ' ■ 

Corollary 3.34. If R is a commutative ring and s € R, then the evaluation map 
e s : R [ a | — > R is a homomorphism. 

Proof. If we set R = S and (p = Ir in Proposition 3.33, then <p = e s . • 

Certain properties of a ring homomorphism follow from its being a homo- 
morphism between the additive groups A and R. For example: /( 0) = 0, 
f(—a) = —f(a), and f(na) = nf(a) for all n e Z. For readers not fa- 
miliar with groups, we prove these statements. Since 0 + 0 = 0, we have 
/( 0) = /( 0) + /( 0), so that subtracting /( 0) from each side gives 0 = /( 0). 
Since — a + a = 0, we have f(—a) + f{a) = /( 0) = 0; subtracting f(a) 
from both sides gives f(—a) = —/(«). The statement f(na) = nf (a ) for all 
n > 0 and all a e R is proved by induction for all n > 0; finally, if n < 0, 
then the result follows by replacing a by —a. That a homomorphism preserves 
multiplication has similar consequences. 


Lemma 3.35. If f : A — > R is a ring homomorphism, then, for all a € A, 

(i) f(a n ) = f{a) n for all n > 0; 

(ii) if a is a unit, then f(a) is a unit and /(a -1 ) = /(a) -1 ; 

(iii) if a is a unit, then f(a ~ n ) = f(a)~ n for all n > 1. 

Proof. 

(i) If n = 0, then f(a°) = 1 = (/(a)) 0 ; this follows from our convention that 
r° = 1 for any ring element r, together with the property /( 1) = 1 satisfied by 
every ring ho mo morph ism. The statement for positive n is proved by induction 
on n > 1 . 

(ii) Applying / to the equation a~fi = 1 shows that fia) is a unit with inverse 

/(a -1 ). 

(iii) Recall that a~ n = (a -1 )", and invoke (i) and (ii). • 
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Corollary 3.36. If f : A — > R is a ring homomorphism, then 

f (U (A)) C U (R), 

where U ( A) is the group of units of A; if f is an isomorphism, then there is a 
group isomorphism 

U (A) = U(R). 

Proof The first statement is just a rephrasing of part (ii) of Lemma 3.35: if a 
is a unit in A, then f(a) is a unit in R. 

If / is an isomorphism, then its inverse f ~ [ : R — »• A is also a ring ho- 
momorphism, by Exercise 3.41(i) on page 248; hence, if r is a unit in R, then 
/ _1 (r) is a unit in A. It is now easy to check that (p : U(A) — > U (R), defined by 
a i-> /(a), is a (group) isomorphism, for its inverse i/a : U(R) U (A) is given 
by r H>- /"* (r). • 

Example 3.37. 

If X is a nonempty set, define a bitstring on X to be a function ft : X — »■ F 2 , 
and denote the set of all bitstrings on X by b(X). When X is finite, say, X = 
{xi , . . . , x „ }, then a bitstring is just a sequence of 0’s and l’s of length n. 

Define binary operations on b(X): if j3,y e b(X), define 

fty : x i-> ft(x)y(x) 


and 

ft + y : x i-> ft(x) + y(x). 


That b(X) is a commutative ring under these operations is the special case of 
Exercise 3.9 on page 227 with R = ¥ 2 - 

Recall the Boolean ring B(X) (see Exercise 3.6(i) on page 226) whose el- 
ements are the subsets of X, with multiplication defined as their intersection: 
AB = AD B, and with addition defined as their symmetric difference: A + B = 
(A — B) U (B — A). We now show that B{X) = b(X). 

If A is a subset of a set X, define its characteristic function '/A'- X F 2 


by 


Xa(x) = 


if x e A 
if x £ A. 


For example, yx is the constant function xx(x) = 1 for all x € X, while Joa is 
the constant function / 0 (a ) = 0 for all x e X. 

Define <p: B(X) — »• b(X) by <p(A) = xa , its characteristic function. If 
x e X, then x e A if and only if xa (x) = 1. Hence, if xa = XB, then x e A 
if and only if x e B: that is, A = B. It follows that (p is an injection. In 
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fact, q> is a bijection, for if / : X — »■ F 2 is a bitstring, then (pi A) = /, where 
A = MX: fix) = 1}. 

We now show that (p is a ring isomorphism. The one in B(X) is X, and 
< p(X ) = xx, the constant function at 1, which is the one in b(X). Let A and B 
be subsets of X. If x e X, then 

iXAXB )(x) = 1 if and only if x £ A and x £ B\ 

that is, xaXb = Xadb • Hence, <p(AB) = <p(A)(p(B). Also 

(Xa + Xb)(x) =1 if and only if x £ A or x £ B but not both; 

that is, XA + XB = XiAUB)-(ADB) = Xa+b [recall that (A - B) U (B - A) = 
(A U B) — (A n 5)]. Hence, (pi A + B) = <p(A) + <p(B). Therefore, <p is an 
isomorphism. < 

Definition. If / : A — > R is a ring homomorphism, then its kernel is 
kerf= [a £ A with /(a) = 0}, 

and its image is 

im f = {r £ R : r = fia) for some a £ A}. 

Notice that if we forget the multiplications, then the rings A and R are addi- 
tive abelian groups and these definitions coincide with the group-theoretic ones. 

Let k be a field, let a £ k and, as in Corollary 3.34, consider the evaluation 
homomorphism e a : k\x) — > k sending fix) i-> fia). Now e a is always surjec- 
tive, for if b e k, then b = e a if), where fix) = x — a +b. By definition, ker e a 
consists of all those polynomials g(^) for which gia) = 0. 

Proposition 3.38. If f : A — > R is a ring homomorphism, where R is a nonzero 
ring, then im f is a subring of R and ker f is a proper subset of A satisfying the 
conditions : 

(i) 0 € ker/; 

(ii) x, y e ker / implies x + y e ker /; 

(iii) x e ker / and a e A imply ax e ker /. 

Proof. If r,r' e im /, then r = fia) and r' = fia') for some a, a' e A. 
Hence, r — r' = fia) — fia') = fia — a') e im / and rr' = fia) fia') = 
fiaa') e im /. Since /(l) = 1, by the definition of a homomorphism, im / is 
a subring of R. 

We observed on page 243 that /( 0) = 0, so that 0 e ker /. If x, y e ker /, 
then fix + y) = fix) + fiy) = 0 + 0 = 0, so that x + y e ker /. If x e ker / 
and a £ A, then fiax) = fia) fix) = fia) 0 = 0, and so ax e ker/. Note 
that ker / is a proper subset of A, for /( 1) = 1 / 0, and so 1 f ker /. • 
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The kernel of a group homomorphism G — > 77 is not merely a subgroup; it 
is a normal subgroup: it is closed under conjugation by any element in the group 
G. Similarly, if / : A — > 77 is a ring homomorphism, then ker / is almost 11 a 
subring because it is closed under addition and multiplication. But ker / is also 
closed under multiplication by any element in the commutative ring A. 

Definition. An ideal in a commutative ring R is a subset I of R such that 

(i) 0 € 7; 

(ii) if a, b e 7, then a + b e 7; 

(iii) if a e 7 and r e R, then ra e 7. 

An ideal 7 7^ 77 is called a proper ideal. 

Proposition 3.38 can be restated. If f : A — > R is a ring homomorphism, 
where R is a nonzero ring, then im f is a subring of R and ker f is a proper 
ideal in A. 

There are two obvious examples of ideals in every nonzero commutative ring 
77: the ring R itself and the subset {0} consisting of 0 alone. In Proposition 3.43, 
we will see that a commutative ring having only these ideals must be a field. 

Example 3.39. 

If b\, b 2 , . . . , b„ lie in R, then the set of all their linear combinadons 

7 = {r\b\ + r 2 b 2 H b r n b n : n e R for all i } 

is an ideal in R. One writes 7 = (b\, b 2 , . . . , b n ) in this case. 

In particular, if n = 1, then 

7 = ( b ) = {rb : r € 77} 

is an ideal in 77; ( b ) consists of all the multiples of b. and it is called the principal 
ideal generated by b. 

Notice that 77 and {0} are always principal ideals: 77 = (1) and {0} = (0). In 
Z, the even integers form the principal ideal (2). A 

Theorem 3.40. Every ideal in Z is a principal ideal. 

Proof. This is just a restatement of Corollary 1.34. • 

Can a principal ideal have more than one generator? 

1 1 If / : A R and R is not the zero ring, then ker / is not a subring because it does not 
contain 1: /( 1) = 1^0. 
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Proposition 3.41. If R is a commutative ring and a = ub for some unit u e R, 
then (a) = ( b ). Conversely, if R is a domain, then (a) = ( b ) implies a = ub for 
some unit u e R. 

Proof. Suppose that a = ub for some unit u e R. If x e (a), then x = ra = 
rub e ( b ) for some r e R, so that (a) C (/;). For the reverse inclusion, if 
y e ( b ), then y = sb for some s e R. Hence, y = sb = su~ l a e (a), so that 
( b ) C (a ) and (a) = (b). 

Conversely, if (a) = (b), then a e (a) = ( b ) says that a = rb for some 
r e R', that is, b | a; similarly, b e (b) = (a) implies a \ b. By Proposition 3.15, 
which assumes that R is a domain, there is a unit u e R with a = ub. • 


Example 3.42. 

If an ideal I in a commutative ring R contains 1 , then I = R, because I contains 
r = r 1 for every r e R. Indeed, an ideal I contains a unit u if and only if / = R. 
Sufficiency is obvious: if I = R, then I contains a unit, namely, 1. Conversely, 
if u e I for some unit u, then I contains u~ l u = 1 , and so I contains r = r 1 for 
every r e R. < 

Proposition 3.43. A nonzero commutative ring R is afield if and only if its only 
ideals are {0} and R itself. 

Proof. Assume that R is a field. If I {0}, it contains some nonzero element, 
and every nonzero element in a field is a unit. Therefore, I = R, by Exam- 
ple 3.42. 

Conversely, assume that R is a commutative ring whose only ideals are {0} 
and R itself. If a e R and a f 0. then the principal ideal (a) = R, for {a) f 0, 
and so 1 e R = (a). There is thus r e R with 1 = ra\ that is, a has an inverse 
in R, and so R is a field. • 


Proposition 3.44. A ring homomorphism f : A — R is an injection if and only 
if ker / = {0}. 

Proof. If / is an injection, then a f 0 implies / (a) f /( 0) = 0, and so a f 
ker f. Therefore, ker / = {0}. Conversely, if ker / = {0} and f (a) = f(a '), 
then 0 = f(a) — f(a ') = f{a — a'). Hence, a — a' € ker / = {0} and a = o'; 
that is, / is an injection. • 


Corollary 3.45. If f : k — > R is a ring homomorphism, where R is not the zero 
ring, and ifk is afield, then f is an injection. 
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Proof. By the proposition, it suffices to prove that ker / = {0}. But ker / is 
a proper ideal in k, by Proposition 3.38, and Proposition 3.43 shows that k has 
only two ideals: k and {0}. Now ker f k, because /( 1) = 1 0 (R is not the 

zero ring). Therefore, ker / = {0} and / is an injection. • 


Exercises 

*3.38 Prove that the function k\x\ —*■ k[A], defined by f(x) (-> /(A), is a surjective 
ring homomorphism. (See Exercise 3.28 on page 239.) 

3.39 Let A be a commutative ring. Prove that a subset J of A is an ideal if and only if 
0 e /, », i) e 1 implies u — v € J, and u e J,a e A imply au e J . (In order that 
J be an ideal, u, v € J should imply u + v € J instead of u — v € J .) 

3.40 (i) Prove that a fi eld with 4 elements (see Exercise 3.17 on page 232) and If. 

are not isomorphic commutative rings. 

(ii) Prove that any two fi elds having exactly four elements are isomorphic. 

*3.41 (i) Let cp: A — »• R be an isomorphism, and let if:R — »• A be its inverse. 

Show that if is an isomorphism. 

(ii) Show that the composite of two homomorphisms (or two isomorphisms) 
is again a homomorphism (or an isomorphism). 

(iii) Show that A = R defi nes an equivalence relation on any family of com- 
mutative rings. 

3.42 Let R be a commutative ring and let J-(R) be the commutative ring of all functions 
f:R^R (see Exercise 3.9 on page 227). 

(i) Show that R is isomorphic to the subring of J-(R) consisting of all the 
constant functions. 

(ii) If f(x) = «o + a\x + • • • + a n x n e R[.v], let / b : R — »• R be defi ned 
by f b (r ) = no + o\r + • • • + a n r n [thus, f b is the polynomial function 
associated to f(x)]. Show that the function cp: R[x\ —*■ J-(R), defined 
by < p: f(x ) i-> f b , is a ring homomorphism. 

(iii) Show that if R = F ; , , where p is a prime, then x p — x e ker cp. (It will be 
shown, in Theorem 3.50, that cp is injective when R is an infi nite fi eld.) 

3.43 Let R be a commutative ring. Show that the function i] : R\x | — >• R. defi ned by 

ly. cio T oix + a 2 X + • • • + a n x n i->- ao, 

is a homomorphism. Describe ker in terms of roots of polynomials. 

*3.44 Let R and S be commutative rings and let 93 : R —*■ S be a homomorphism. Show 
that cp* : R[.v] —*■ S[x], defi ned by 

cp*: r 0 + r x x + r 2 x 2 H <p(r 0 ) + q>(n)x + cp(r 2 )x 2 -I , 

is a homomorphism. 

*3.45 Let R and S be commutative rings, and let if : R — >• S be a homomorphism with 
ker if = I . If J is an ideal in S, prove that 1 f~ l (J) is an ideal in R which con- 
tains I . 
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3.46 If I? is a commutative ring and c e R. prove that the function (p : R\x] -> /?[ x], 
defi ned by f(x ) i-> f(x + c), is an isomorphism. In more detail, (pQ2i •S;'X , ) = 

Ei Si(x + c ) 1 . 

*3.47 Let p be a prime. 

(i) Show that if p is a prime, then the function F : F p — > F ;7 , given by 
F(a) = a 1 ’ , is an isomorphism ( F is called the Frobenius map). 

(ii) Show that every element a e F ;7 has a pth root, i.e., there is b € F ;7 with 
a = b p . 

(iii) Let k be a field that contains IjJ, as a subfield [e.g., k = Frac(F / , [ jc])]. 
For every positive integer n, show that the function F n : k —*■ k, given by 
F(a ) = a p , is an injective ring homomorphism. 

3.48 If R is a fi eld. show that R = Fracl/?). More precisely, show that the homomor- 
phism /:/?—> Frac(l?) in Example 3.31, namely, r i-> |r, 1], is an isomorphism. 
*3.49 Let R be a domain and let F be a fi eld containing R as a subring. 

(i) Prove that E = {uv~ l : u,v € R and v 0} is a subfi eld of F containing 
R as a subring. 

(ii) Prove that Frac(/?) = E, where E is the subfi eld of F defi ned in part (i). 
(See Exercise 3.21 on page 232.) 

*3.50 (i) If A and R are domains and <p : A — > R is a ring isomorphism, then 

[a, b ] h> [^(a), cp(b)] is a ring isomorphism Frac(A) — > Frac(/?). 

(ii) Show that a fi eld k containing an isomorphic copy of Z as a subring must 
contain an isomorphic copy of Q. 

3.51 Let R be a domain with fraction fi eld F = Fract/?). 

(i) Prove that Frac(/?[x]) = F(x). 

(ii) Prove that Frac(/?[xi , X 2 , ■ . ■ , x,,]) = F(x i ,X 2 ,..., x„). 

3.52 (i) If R and S are commutative rings, show that their direct product R x S is 

also a commutative ring, where addition and multiplication in R x S are 
defi ned ‘fcoordinatewise:” 

(r, s) + (r' , s') = (r + r' , s + s') and (r, s)(r', s') = (rr' , ss'). 

(ii) Show that R x {0} is an ideal in R x S. 

(iii) Show that R x S is not a domain if neither R nor S is the zero ring. 

*3.53 (i) If R and S are commutative rings, prove that 

U{R x S ) = U(R) x U(S), 

where U (R) is the group of units of R. 

(ii) Show that if m and n are relatively prime, then I mn = I,„ x I„ as rings. 

(iii) Use part (ii) to give a new proof of Corollary 2.128: if (m, n) = 1, then 
( p(mn ) = <t>(m)<p(n), where <p is the Euler 0-function. 

3.54 Let F be the set of all 2 x 2 real matrices of the form 

, a b 

A = , 

— b a 

(i) Prove that F is a fi eld (with operations matrix addition and matrix multi- 
plication). 
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(ii) Prove that F is isomorphic to C. 


3.5 Greatest Common Divisors 

We are now going to see that virtually all the theorems proved for Z in Chapter 1 
have polynomial analogs in k[x ], where k is a held; we shall also see that the 
proofs there can be translated into proofs here. 

The division algorithm for polynomials with coefficients in a held says that 
long division is possible. 


s~\ n x m - n + 


S n X n + s, i-\X n 1 + ■ 


t,„x' n + t m —\X m 1 + ■ 


Definition. If / (x) = s n x n + • • • + s i x + so is a polynomial of degree n, then 
its leading term is 

LT (/) = s n x n . 

Let k be a held and let f{x) = s n x n + • • • + six + so and g(x) = t m x m + 
• • • + t\x + to be polynomials in k[x ] with deg (/) < deg(g); that is, n < m. 
Then s~ 1 e k, because k is a field, and 

= S - ] t m x m - n €k[xl 
LT (/) " 


thus, LT (/) | LT(g). 

Theorem 3.46 (Division Algorithm). Let R be a commutative ring, let fix), 
g(x) £ R\x |, and let the leading coefficient of f (x) be a unit in R. 

(i) There are polynomials q{x), r(x) € R \ x | with 

g(x) = q(x)f(x) + r(x), 

where either r(x) = 0 or deg(r) < deg (/). 

(ii) If R is a domain, then the polynomials q (x) and r(x) in part (i) are unique. 
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Remark. If R is a field, the hypothesis that the leading coefficient of f(x) is a 
unit is equivalent to f (x) f 0. M 

Proof. 

(i) We prove the existence of q (x ) . r (x ) e A'[a | as in the statement. If / | g, 
then g = qf for some q: define the remainder r = 0, and we are done. If 
f \ g, then consider all (necessarily nonzero) polynomials of the form g — qf as 
q varies over R[x |. The least integer axiom provides a polynomial r = g — qf 
having least degree among all such polynomials. Since g = qf + r, it suffices 
to show that deg(r) < deg(/'). Write f{x) = s„x n + • • • + six + so and r{x) = 
t m x m + • • • + t\x + to . By hypothesis, s n is a unit, and so s" 1 exists in k. If 
deg(r) > deg (/), define 

h(x) = r(x ) - t m s~ l x m ~ n f (x); 


that is, h = r — [LT(r)/LT(/)] /; note that h = 0 or deg(/z) < deg(r). If h = 0, 
then r = [LT(r)/LT(/)]/ and 


g = qf + r = qf + 


LT (r) 
LT (/) 


LT(r) 

LT(/)J 


f, 


contradicting / \ g. If h f 0, then deg(/i) < deg(r) and 


g~qf = r= h + 


LT(r) 
LT (/) 


Thus, g — [t/ + LT(r)/LT(/)]/ = h, contradicting r being a polynomial of least 
degree having this form. Therefore, deg(r) < deg (/). 

(ii) To prove uniqueness of q (x) and r(x), assume that g = q' f + r' , where 
deg(r') < deg (/). Then 

0 q - q')f = r'-r. 


If r' r, then each side has a degree. But deg((g — q')f) = deg (q — q') + 
deg (/) > deg (/), while deg(r' - r) < max{deg(rO, deg(r)} < deg (/), a 
contradiction. Hence, r' = r and (q — q') f = 0. Now R[x ] is a domain because 
R is a domain. It follows that q — q' = 0 and q = q' . • 


Our proof of the division algorithm for polynomials is written as an indirect 
proof, but the proof can be recast so that it is a true algorithm, as is the division 
algorithm for integers. Here is a pseudocode implementing it. 

Input: g, / 

Output: q , r 
q := 0; r ■= g 

WHILE r / 0 AND LT (/) | LT(r) DO 
q:=q + [LT(r)/LT(/)]x de sM- de S^> 
r := r — [LT(g)/LT(/)]/ 

END WHILE 
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Definition. If /(x ) and g(x) are polynomials in k[x\, where k is a field, then 
the polynomials q(x) and r(x) occurring in the division algorithm are called the 
quotient and the remainder after dividing g(x) by f(x). 

The next theorem uses the division algorithm to divide by a monic polyno- 
mial in Z [x]. Recall that cyclotomic polynomials were defined on page 29. 

Proposition 3.47. For every positive integer n, the cyclotomic polynomial 
<!>„ (x) is a monic polynomial all of whose coefficients are integers. 

Proof The proof is by induction on n > 1. The base step holds, for 4>i(x) = 
x — l. For the inductive step, we assume that <t>,/ (x) is a monic polynomial 
with integer coefficients. From the equation x n — 1 = \\ d Ty/ (x ) (see Proposi- 
tion 1 .27), we have 

x n - 1 = <t>„(x)f(x), 

where f(x) is the product of all <J>,/(x), where d is a proper divisor of n (i.e., 
d | n and d < n ) . By the inductive hypothesis, f(x) is a monic polynomial 
with integer coefficients. Because fix) is monic, the division algorithm for 
monic polynomials in Z [x ] shows that (x " — l)//(x) £ Z [x] and, hence, all the 
coefficients of 4>„(x) = (x" — 1 )//(x) are integers, as desired. • 

We now turn our attention to roots of polynomials. 

Definition. If fix) £ k\x\ where k is a field, then a root of fix) in k is an 
element a £ k with f (a ) = 0. 

Remark. The polynomial /(x) = x 2 — 2 has its coefficients in Q, but we 
usually say that \fl is a root of fix) even though s/l £ Q. We shall see later, 
in Theorem 3.1 18, that for every polynomial fix) £ k[x ]. where k is any field, 
there is a larger field E which contains k as a subfield and which “contains all 
the roots” of /(x). ◄ 

Lemma 3.48. Let fix) £ k\x\ where k is afield, and let a £ k. Then there is 
q{x) £ k[x] with 

fix) = qix)ix -a) + fia). 

Proof Use the division algorithm to obtain 

fix) = qix)ix - a) + r; 

the remainder r is a constant because x — a has degree 1. By Corollary 3.34, 
e a '■ k [ x | — > k, evaluation at a, is a ring homomorphism: 

e a if) = e a iq)e a ix - a) + e«(r). 

Hence, fia) = qia)ia — a) + r, and so r = fia). • 
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There is a connection between roots and factoring. 

Proposition 3.49. If f (x) e k[x |, where k is a field, then a e k is a root of 
fix) in k if and only if x — a divides fix) in k[x]. 

Proof If a is a root of fix) in k, then f (a) = 0 and the lemma gives fix) = 
q(x)(x — a). Conversely, if f(x) = q(x)(x — a), then evaluating at a gives 
fia) = q(a)(a — a) = 0. • 

Theorem 3.50. Let k be afield and let fix) e k[x\. 

(i) If fix') has degree n, then fix) has at most n roots in k. 

(ii) If f(x) has degree n and a\, a 2 , • ■ ■ , a n e k are distinct roots of fix) in k, 
then there is c € k and a factorization 

f(x) = c(x — a\)ix — 02 ) • • • (x — a n ). 


Proof 

(i) We prove the statement by induction on n > 0. If n = 0, then fix) is a 
nonzero constant, and so the number of its roots in k is zero. Now let n > 0. If 
fix) has no roots in k. then we are done, for 0 < n. Otherwise, we may assume 
that there is a e k with a a root of /(x); hence, by Proposition 3.49, 

fix) = qix)ix - a), 

and q (x) e /c[x | has degree n — 1. If there is a root b e k with b f a, then 

0 = fib) = qib)(b - a). 

Since b — a fi=- 0 and k is a field (and, hence, a domain), we have q ib) = 0, so 
that b is a root of qix). But deg iq) = n — 1, so that the inductive hypothesis says 
that qix) has at most n — 1 roots in k. Therefore, fix) has at most n roots in k. 

(ii) The proof, by induction on n > 1 , is left to the reader. • 

Example 3.51. 

Theorem 3.50 is false for arbitrary commutative rings. The polynomial x~ — 1 e 
I 8 [x] has 4 roots in I 8 , namely, [1], [3], [5], and [7]. ◄ 

Recall that every polynomial fix) = c n x n + c„_ ix" _1 + • • • + co e /c[x | 
determines the polynomial function y : k k with f’ia) = c n a n +c n -\a n ~ { + 
• • • + co for all a e k. In Exercise 3.30 on page 239, however, we saw that a 
nonzero polynomial in ¥ p [x], e.g., x p — x, can determine the constant function 
zero. This pathology vanishes when the field k is infinite. 
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Corollary 3.52. Let k be an infinite field and let fix ) and gix ) be polynomi- 
als in k[x\. If f{x) and gix) determine the same polynomial function, i.e., if 
/ b (o) = g b (a) for all a e k, then fix) = gix). 

Proof If f{x) g(x), then the polynomial fix) — g(x) is nonzero, so that 
it has some degree, say, n. Now every element of k is a root of fix) — gix)', 
since k is infinite, this polynomial of degree n has more than n roots, and this 
contradicts the theorem. • 

The last proof gives a bit more. 

Corollary 3.53. Let k be any {possibly finite) field, let fix), gix) e k[x \, and 
let n = max{deg(/), deg(g)}. If there are n + 1 distinct elements a e k with 
fia) = gia), then fix) = gix). 

Proof If fix) f=- gix), then h{x) = fix) — gix) 0, and 
deg(fi) < max{deg(/), deg(g)} = n. 

By hypothesis, there are n + 1 elements a e k with h (a ) = f (a) — gia) = 0, 
contradicting Theorem 3.50(i). Therefore, li (x) = 0 and fix) = gix). • 

Corollary 3.54 (Lagrange Interpolation). Let k be afield, and let u o, . . . , u n 

be distinct elements of k. Given any list yo, . . . , y n in k, there exists a unique 
fix) e /c[v] of degree < n with fiu{) = y, for all i =0, ... , n. In fact, 

... . (X - Mo) ' ' ' (X - Uj) ■ ■ ■ ix - u n ) 

fix) = 2_yi — • 

/=0 im - Mo) • • • ( Ui - Ui) ■ ■ ■ im - Un) 

Proof. The polynomial fix) displayed in the formula has degree at most n and 
fiui) = }’i for all i. Uniqueness follows from Corollary 3.53. • 


Remark. The Lagrange interpolation formula can be shortened using Exer- 
cise 1.14 on page 14: if / = go---g„, then f = J2’l =0 djf, where d, f = 
go ' ' ' g'i ■ ■ ■ gn ■ If giix) = X - Ui, then 


fix) = X!- v/ 

i=0 


difix) 
difiui) ’ 


because difix) = (x — mq) • • • (x — u{) • • • (x — u n ). ◄ 


The definition of greatest common divisor in k[x ], where k is a field, is es- 
sentially the same as the corresponding definition for integers. We shall define 
greatest common divisors in general domains on page 256. 
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Definition. If k is a field and fix), gix) € k[x\, then a common divisor is a 
polynomial c (x) e k[x ] with c(x) | fix ) andc(x) | gix). 

If fix) = 0 = gix), then the greatest common divisor, abbreviated gcd, 
is defined to be 0. If at least one of fix), gix) is nonzero, then the greatest 
common divisor, denoted by if, g), is a monic common divisor dix) e k\x \ 
with deg(c) < deg(cf) for every common divisor c(x). 

The following elementary lemma often allows us to prove results about arbi- 
trary polynomials from the special case of monic polynomials. 

Lemma 3.55. Let fix) be a nonzero polynomial in k[x\, where k is afield. If 
hix) = a n x n + • • • + ao £ k[x ] is a divisor of fix), then a„hfx) is a monic 
divisor of fix) having the same degree as hix). 

Proof. Since k is a field, a n e k nonzero implies afi 1 e k. If fix) = c(x)h (x), 
then fix) = [a„c(x)][a“ 1 fi(x)]. Of course, deg iaf l h) = deg(/i). • 

It follows from the lemma that if fix), gix) e k[x |, where A: is a field, and 
if c (x ) is a common divisor of fix) and gix). then there is a monic common 
divisor having deg(c). 

Proposition 3.56. Ifk is afield, then every pair fix), gix) e k[x ] has a gcd. 

Proof. Since the theorem is obvious if both fix) and gix) are 0, we may as- 
sume that fix) f 0. If hix) is a divisor of fix), then deg (/? ) < deg if); a 
fortiori, deg if) is an upper bound for the degrees of the common divisors of 
fix) and g(x). Let dix) be a common divisor of largest degree; since £ is a 
field, we may assume, by Lemma 3.55, that dix) is monic. • 

Here is the analog of Theorem 1.32; we will soon use it to prove uniqueness 
of gcd’s. 

Theorem 3.57. Ifk is afield and fix), gix) e k[x\, then their gcd is a linear 
combination of fix) and gix). 

Remark. By linear combination, we now mean s f + tg, where both ,v = six) 
and t = tix) arc polynomials in k[x ] . A 

Proof. We may assume that at least one of / and g is not zero (for the gcd is 0 
otherwise). Consider the set I of all the linear combinations: 

I = {s(x) fix) + tix)gix) : six), tix) e &[x]} . 

Now / and g are in I (take .v = I and t = 0 or vice versa). It follows that if 
N = {n e N : n = deg if), where fix) e I, then N is nonempty. By the least 
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integer axiom, N contains a smallest integer, say, n and there is some d(x) £ I 
with deg {d) = n\ as in Lemma 3.55, we may assume that d{x) is monic. We 
claim that d(x) = (/, g ). 

Since d € /, it is a linear combination of / and g: 

d = sf + tg. 

Let us show that d is a common divisor by trying to divide each of / and g by 
d. The division algorithm gives f = qd + r, where r = 0 or deg(r) < deg id). 
If r 7 ^ 0, then 


r = f ~ qd = f - q{sf + tg) = (1 - qs)f - qtg £ I, 

contradicting d having smallest degree among all linear combinations of / and g. 
Hence, r = 0 and d | /; a similar argument shows that d \ g. 

Finally, if c is a common divisor of / and g, then c divides d = sf + tg. But 
c | d implies deg(c) < deg(d). Therefore, d is the gcd of / and g. • 

Corollary 3.58. Let k be afield and let f(x), g(x) £ k\x\ 

(i) A monic common divisor d ( x ) is the gcd if and only if d(x) is divisible by 
every common divisor. 

(ii) Every two polynomials f{x) and g(x) have a unique gcd. 

Proof The last paragraph of the proof of Theorem 3.57 shows that every com- 
mon divisor c of / and g is a divisor of d = sf + tg. 

Conversely, let d denote a gcd of / and g, and let d' be a common divisor 
divisible by every common divisor c; thus, d \ d'. On the other hand, d is 
divisible by every common divisor (as we have just seen in the first paragraph), so 
that d' | d . By Proposition 3.15 (which applies because k[x ] is a domain), there 
is a unit u (x ) £ k[x \ w ith d'(x) = u(x)d(x). By Exercise 3.29 on page 239, u(x) 
is a nonzero constant; call it u. Hence, d'(x) = ud (x); if the leading coefficients 
are s' and s, then s' = us. But both d (x) and d'(x) are monic, so that u = 1 and 
d'(x) = d{x). This last argument also proves uniqueness of the gcd. • 

Greatest common divisors can be defined in any domain R once one takes 
note of Corollaries 1.33 and 3.58. 

Definition. If R is a domain and a,b £ R, then a common divisor is an element 
c £ R with c | a and c \ b. If a = 0 = b, then the greatest common divisor, 
abbreviated gcd, is defined to be 0. If at least one of a, b is nonzero, then a 
greatest common divisor, denoted by (a. b), is a common divisor d £ R with 
c | d for every common divisor c £ R. 
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Exercise 3.75 on page 272 gives an example of a domain R containing a pair 
of elements having no gcd. 

The definition just given is not quite the same as our definition of gcd in Z 
(in Chapter 1), because it does not insist that gcd’s be nonnegative. For example, 
both 2 and —2 are gcd’s of 4 and 6 in the new sense; this shows why we wrote “a 
gcd” in the latest definition instead of “the gcd.” Similarly, this definition of gcd 
in k[x ] is slightly different than our earlier definition on page 255 because gcd’s 
in the present sense need not be monic. In order to make them unique, gcd’s in Z 
are defined to be nonnegative and gcd’s in k[x] are defined to be monic. In more 
general domains, gcd’s are not unique. However, if both d and d' are gcd’s of 
a,b e R, then there is a unit u e R with d' = ud (for each divides the other). In 
Z, gcd’s are unique to sign (because the only units in Z are ±1), and we choose 
the positive gcd as our favorite; in k[x], where k is a field, any two gcd’s differ 
by a constant nonzero multiple (for the only units are the nonzero constants), and 
we choose the monic polynomial as our favorite. But, in general domains, there 
is no obvious choice of a “favorite,” and so we cannot single out one of the gcd’s 
as being “the gcd.” By Proposition 3.41, the principal ideals generated by two 
gcd’s d and d' of a and b coincide: ( d ') = (d). Thus, even though the gcd of 
elements a and b is not unique, the gcd’s determine a unique principal ideal. 

Theorem 3.40 says that every ideal in Z is a principal ideal; its analog below 
has essentially the same proof. 

Theorem 3.59. If k is a field, then every ideal I in k[x ] is a principal ideal. 
Moreover, if I {0}, there is a monic polynomial that generates I. 

Proof If I consists of 0 alone, take d = 0. If there are nonzero polynomials 
in 7, the least integer axiom allows us to choose a polynomial d{x) e 7 of least 
degree. As in Lemma 3.55, we may assume that d (x) is monic. 

We claim that every / in 7 is a multiple of d . The division algorithm gives 
polynomials q and r with f = qd + r, where either r = 0 or deg(r) < deg(d). 
Now del gives qd e 7, by part (iii) in the definition of ideal; hence part (ii) 
gives r = f — qd el. If r f=- 0, then it has a degree and deg(r) < dcg(d), 
contradicting d having least degree among all the polynomials in 7. Therefore, 
r = 0 and / is a multiple of d. • 

The proof of Theorem 3.57 identifies the gcd of f (x ) and g(x) (when at 
least one of them is not the zero polynomial) as the monic generator of the ideal 
7 = (/(x), g(x)) consisting of all the linear combinations of /(x) and g (x ) 
Recall the notation introduced in Example 3.39: if b\, ... ,b n e R, then the 
ideal consisting of all their linear combinations is denoted by (b \ , . . . , b n ). This 
explains the notation (/, g) for the gcd. Notice that this notation makes sense 
even when both /(x) and g (x ) are the zero polynomial: the gcd is 0, which is 
the generator of the ideal (0, 0) = {0}. 
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Definition. A commutative ring R is a principal ideal domain if it is a domain 
in which every ideal is a principal ideal. One usually abbreviates this name to 


PID. 


Example 3.60. 

(i) The ring Z of integers is a PID, by Theorem 3.40. 

(ii) Every field is a PID, by Proposition 3.43. 

(iii) If k is a field, then the polynomial ring k[x ] is a PID, by Theorem 3.59. 

(iv) If k is a field, then Exercise 3.68 on page 272 shows that the ring k[[x]] of 
formal power series is a PID. 

(v) There are rings other than Z and k[x ] where A: is a field, that have a divi- 

sion algorithm; the ring of Gaussian integers Z [i ] is an example of such a 
ring. These rings are called euclidean rings, and they, too, are PID’s (see 
Proposition 3.78). ◄ 

It is not true that ideals in arbitrary commutative rings are always principal 
ideals. 

Example 3.61. 

(i) Let R = Z [x], the commutative ring of all polynomials over Z. It is easy 
to see that the set / of all polynomials with even constant term is an ideal 
in Z [x]. We show that / is not a principal ideal. 

Suppose there is d(x) e Z[x] with I = ( d(x )). The constant 2 el, 
so that there is /(x ) e Z [x | with 2 = d (x )f(x). Since the degree of a 
product is the sum of the degrees of the factors, 0 = deg(2) = deg id) + 
deg (/). Since degrees are nonnegative, it follows that deg it/) = 0, i.e., 
d(x) is a nonzero constant. As constants here are integers, the candidates 
for d{x) are ±1 and ±2. Suppose d(x) = ±2; since x e I, there is 
g(x) e Z[x]withx = d(x)g(x) = ±2g(x). But every coefficient on 
the right side is even, while the coefficient of x on the left side is 1 . This 
contradiction gives t/(x) = ±1. By Proposition 3.43, I = Z[x], another 
contradiction. Therefore, no such t/(x) exists, that is, the ideal I is not a 
principal ideal. 

(ii) Exercise 3.71 on page 272 shows that k[x, y ], where k is a field, is not 
a PID. More precisely, it asks you to prove that the ideal (x, y) is not a 
principal ideal. A 
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Example 3.62. 

If I and J are ideals in a commutative ring R, we now show that I D J is also 
an ideal in R. Since 0 el and 0 e J, we have 0 e / fl J. If a, b e I n /, then 
a —b e I and a —b e J, for each is an ideal, and so a — be I D J .If a e I D J 
and r e R, then ra e I and ra e J, hence ra € I D J. Therefore, I C\ J is an 
ideal. With minor alterations, this argument also proves that the intersection of 
any, possibly infinite, family of ideals in R is also an ideal in R. A 

Definition. If fix), g (x ) e k[x ], where k is a field, then a common multiple 
is a polynomial mix) e k[x] with /(x) | mix) and gix) \ mix). 

Given polynomials fix) and gix) in k[x\. both nonzero, define their least 
common multiple, abbreviated 1cm, to be a monic common multiple of them 
having smallest degree. If fix) = 0 or g(x) = 0, define their 1cm = 0. A 1cm 
of f(x) and g (x ) is often denoted by 

[fix), g(x)]. 

Proposition 3.63. Assume that k is a field and that fix), gix) € k[x ] are 
nonzero. 

(i) [fix), g(x)] is the monic generator of if (x)) Cl (g(x)). 

(ii) Let infix) be a monic common multiple of fix) and gix). Then mix) = 
[/(•*)> gi x )] if and only if mix) divides every common multiple of fix) 
and gix). 

(iii) Every pair of polynomials fix) and gix) has a unique 1cm. 

Proof. 

(i) Since fix) 0 and gix) 0, we have if) Cl (g) fi=- 0, because 0 7 ^ fg € 
if) H (g). By Theorem 3.59, if) Cl (g) = (in), where m is the monic polynomial 
of least degree in (/) Cl (g). As m e if), m = qf for some qix) e k[x), and 
so / | m ; similarly, g | in, so that m is a common multiple of / and g. If M is 
another common multiple, then M e if) and M e (g), hence M e if) Cl (g) = 
(m), and so m \ M. Therefore, deg(m) < deg(M), and m = [/, g]. 

(ii) We have just seen that [/, g] divides every common multiple M of f and g. 
Conversely, assume that in' is a monic common multiple that divides every other 
common multiple. Now in' \ [/, g], because [/, g] is a common multiple, while 
[/, g] [ in', by part (i). Proposition 3.15 provides a unit u(x) e k[x \ with 
in' ix) = u(x)m(x): by Exercise 3.29 on page 239, m(x) is a nonzero constant. 
Since both m(x) and m'ix) are monic, it follows that m(x) = m' ix). 

(iii) If t and t' are both lcm’s of fix) and gix), then part (ii) shows that each 
divides the other. Uniqueness now follows from Proposition 3.15, for both t and 
f are monic. • 

Here is the generalization of a prime number. 
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Definition. An element p is a commutative ring R is irreducible if p is neither 
0 nor a unit and if, in any factorization p = ab in R, either a or b is a unit. 

The irreducible elements in Z are ±p, where p is a prime. The next propo- 
sition describes irreducible polynomials in k[x ], where k is a field. 

Proposition 3.64. Ifk is afield, then a nonconstant polynomial p(x) e £[x] 
is irreducible in k[x \ if and only if p{x) has no factorization in k[x ] of the form 
p(x) = f(x)g(x) in which both factors have degree smaller than dcg( p ). 

Proof If p (x ) is irreducible, it must be nonconstant. If p(x) = f(x)g(x) in 
k[x], where both factors have degree smaller than deg (p), then neither deg (/) 
nor deg(g) is 0, and so neither factor is a unit in k[x ]. This is a contradiction. 

Conversely, if p(x) cannot be factored into polynomials of smaller degree, 
then its only factors have the form a or ap(x), where a is a nonzero constant. 
Since k is a field, nonzero constants are units, and so p(x) is irreducible. • 

This characterization of irreducible polynomials does not apply to polyno- 
mial rings J?[x] when R not a field. In Q [a ] . the polynomial /(x) = 2x + 2 
is irreducible (because 2 is a unit). However, /(x) is not irreducible in Z[x] 
because /(x) = 2(x + 1), and neither 2 nor x + 1 is a unit in Z [x]. 

A linear polynomial /(x) e &[x], where k is a field, is always irreducible [if 
f = gh, then 1 = deg (/) = deg(g) + deg(/z), and so one of g or h must have 
degree 0 while the other has degree 1 = deg(/)]. There are polynomial rings 
over fields in which linear polynomials are the only irreducible polynomials. For 
example, the fundamental theorem of algebra says that C[x] is such a ring. 

Just as the definition of divisibility in a commutative ring R depends on R, 
so, too, does irreducibility of a polynomial p (x ) e k[x ] depend on the commuta- 
tive ring k[x] and hence on the field k. For example, p (x ) = x 2 — 2 is irreducible 
in Q[x], but it factors as (x + s/2 ) (x — s/2) in R[x]. 

Proposition 3.65. Let k be afield and let fix) £ k [x | be a quadratic or cubic 
polynomial. Then fix) is irreducible in k[x) if and only if f(x) does not have a 
root in k. 

Proof If /(x) has a root a in k, then Proposition 3.49 shows that /(x) has an 
honest factorization, and so it is not irreducible. 

Conversely, assume that /(x) is not irreducible, i.e., there is a factorization 
f{x) = g{x)h(x) in k[x ] with deg(g) < deg (/) and deg(/z) < deg (/). By 
Lemma 3.18, deg(/) = deg(g) +deg(/z). Since deg (/) = 2 or 3, one of deg(g), 
deg (/7 ) must be 1, and Proposition 3.49 says that /(x) has a root in k. • 

This corollary is false for larger degrees. For example, 
x 4 + 2x 2 + 1 = (x 2 + l) 2 
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obviously factors in R[x], but it has no real roots. 


Proposition 3.66. If k is a field, then every nonconstant f(x ) e k[x\ has a 
factorization 

fix) = ap\{x) ■ • • p,(x ), 

where a is a nonzero constant and the pi(x) are monic irreducible polynomials. 

Proof We prove the proposition for a polynomial f(x) by (the second form of) 
induction on deg (/) > 1. If deg if) = 1, then fix) = ax + c = a(x + o _1 c); 
as every linear polynomial, x + a~ l c is irreducible, and so it is a product of ir- 
reducibles (our usage of the word product allows only one factor). Assume now 
that deg (/) > 1. If fix) is irreducible and its leading coefficient is a, write 
f(x) = a(a _1 /(x)); we are done, for a~ l fix) is monic. If fix) is not irre- 
ducible, then fix) = gix)hix), where deg(g) < deg if) and deg ih) < deg (/). 
By the inductive hypothesis, there are factorizations gix) = bp i(x) • • • p m ix) 
and h (x ) = cq\lx) ■ ■ -q n ix), where the p’s and q’s are monic irreducibles. It 
follows that fix) = (bc)p\(x) ■ ■ ■ p m (x)q\ix) ■ ■ ■ q n ix), as desired. • 

The analog of the fundamental theorem of arithmetic, uniqueness of the 
monic irreducible factors, will be proved in the next section. 

If & is a field, it is easy to see that if p)x),qlx) e k[x] are irreducible 
polynomials, then pix) \ qix) if and only if there is a unit u with qix) = up lx). 
If, in addition, both pix) and qix) are monic, then pix) \ qix) implies pix) = 
qix). Here is the analog of Proposition 1.31. 


Lemma 3.67. Let k be afield, let pix), fix) e k[x], and let d(x) = ip, f) be 
their gc d. If pix) is a monic irreducible, then 

d M =\ > 

\p(x) if pix) \ fix). 

Proof. The only monic divisors of pix) are 1 and pix). If pix) \ fix), then 
dix) = pix), for pix) is monic. If pix) j fix), then the only monic common 
divisor is 1 , and so dix) = 1 . • 


Theorem 3.68 (Euclid’s Lemma). Let k be a field and let fix), gix) e 
k[x). If pix) is an irreducible polynomial in k[x\ and if pix) | /(x)g(x), then 
pix) | fix) or pix) | g(x). More generally, if pix) \ /l(x) • • • /„(x), then 
pix) | fix) for some i. 
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Proof. If p | f, we are done. If p \ f, then the lemma says that gcd( p. /) = 1. 
There are thus polynomials s(x) and t (x ) with 1 = sp + tf, and so 

8 = S P8 + tfg. 

Since p \ fg, it follows that p \ g, as desired. The second statement follows by 
induction on n >2. • 

Definition. Two polynomials /(x), g (x ) e k[x \. where k is a field, are called 
relatively prime if their gcd is 1 . 

Corollary 3.69. Let /(x), g(x), h(x) e k[x\, where k is a field, and let h(x) 
and f(x) be relatively prime. lfh(x) \ /(x)g(x), then h(x) \ g(x). 

Proof. By hypothesis, fg = hq for some q (x ) e k [x | . There are polynomials 
s and t with 1 = sf + th, and so g = sfg + thg = shq + thg = h(sq + tg)\ 
that is, h | g. • 

Definition. If k is a field , then a rational function / (x ) / g (x ) e k(x) is in lowest 
terms if /(x) and g (x ) are relatively prime. 

Proposition 3.70. Ifk is afield, every nonzero f (x ) / g (x ) e k(x) can be putin 
lowest terms. 

Proof. If d = (/, g ) . then / = df and g = dg' in k\x |. Moreover, f and g' 
are relatively prime, for if h were a nonconstant common divisor of f and g', 
then hd would be a common divisor of / and g of degree greater than that of d . 
Now f/g = df /dg' = fig’, and the latter is in lowest terms. • 

The same complaint about computing gcd’s that arose in Z arises here, and 
it has the same resolution. 


Theorem 3.71 (Euclidean Algorithm). Ifk is afield and /(x), g(x) e k[x\, 
then there is an algorithm computing the gcd (/(x), g(x)), and there is an algo- 
rithmfinding a pair of polynomials s(x) and t(x) with (/, g) = sf + tg. 


Proof. The proof is just a repetition of the proof of the euclidean algorithm in 
Z: iterated application of the division algorithm. 

g = qof + r\ deg(rf) < deg (/) 

/ = qm + r 2 deg(r 2 ) < deg(n) 

r\ = qm + r 2 deg(r 3 ) < deg(r 2 ) 

ri = qm + r 4 deg(r 4 ) < deg(r 3 ) 
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As in the proof of Theorem 1 .41 , the last nonzero remainder is a common divisor 
which is divisible by every common divisor. Since the remainder may not be 
monic (even if / and g are both monic, the remainder r = g — qf may not be 
monic), one must make it monic, as in Lemma 3.55. • 

Example 3.72. 

Use the euclidean algorithm to find the gcd (x 5 + 1, x 3 + 1) in Q[x]. 

v 5 + 1 = x 2 (x 3 + 1) + (—x 2 + 1) 
x 3 + 1 = ( — x)( — x 2 + l) + (x + l) 

— x 2 + 1 = (—x + l)(x + 1). 

We conclude that x + 1 is the gcd. ◄ 

Example 3.73. 

Find the gcd in Q[x] of 

f{x) = x 3 — x 2 — x + 1 and g(x) = x 3 + 4x 2 + x — 6. 

Note that fix), gix) e Z [x], and Z is not a field. As we proceed, rational num- 
bers may enter, for Q is the smallest field containing Z. Here are the equations. 

g=l-f+ (5x 2 + 2x - 7) 

/ = (5* - ^)(5x 2 + 2x - 7) + (§fx - if) 

5x 2 + 2x — 7 = (i§5x + ^) (§fx - if) . 

We conclude that the gcd is x — 1 [which is |fx — if made monic]. The reader 
should find six), t(x) expressing x — 1 as a linear combination (as in arithmetic, 
work from the bottom up). 

As a computational aid, one can clear denominators at any stage. For exam- 
ple, one can replace the second equation above by 

(5x - 7)(5x 2 + 2x - 7) + (24x - 24); 

after all, we ultimately multiply by a unit to obtain a monic gcd. ◄ 

Example 3.74. 

Find the gcd in F5 [x] of 

fix) = x 3 — x 2 — x + 1 and gix) = x 3 + 4x 2 + x — 6. 
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The terms in the euclidean algorithm simplify considerably. 

g = 1 • / + ( 2 * + 3 ) 

/ = (3x 2 + 2)(2x + 3). 

The gcd is x — 1 (which is 2x + 3 made monic). ◄ 

Here are factorizations of the polynomials /(x) and g(x) in Example 3.73: 

fix) = x 3 — x 2 — X + 1 = (x — l) 2 (x + 1) 

and 

g(x) = x 3 + 4x 2 + x — 6 = (x — l)(x + 2)(x + 3); 

one could have seen that x — 1 is the gcd, had these factorizations been known at 
the outset. This suggests that an analog of the fundamental theorem of arithmetic 
could provide another way to compute gcd’s. Such an analog does exist (see 
Proposition 3.86). As a practical matter, however, factoring polynomials is a 
very difficult task, and the euclidean algorithm is the best way to compute gcd’s. 
Here is an unexpected bonus from the euclidean algorithm. 

Corollary 3.75. Let k be a subfield of a field K, so that k[x ] is a subring of 
K[x\ If f{x), gix) £ k\x\ then their gcd in k[x) is equal to their gcd in K[x\ 

Proof. The division algorithm in K[x ] gives 

gix) = < 200 / (x) + Rix), 

where Q(x). Rix) £ K [ x | and either Rix) = 0 or deg( R) < deg if); since 
fix), gix) £ k[x], the division algorithm in k[x \ gives 

gix) = qix)f (x) + rix), 

where qix),rix) £ k[x ] and either rix) = 0 or deg(r) < deg if). But the 
equation gix) = qix)fix) + rix) also holds in K[x] because k[x ] C K\x], so 
that the uniqueness of quotient and remainder in the division algorithm in K[x] 
gives <2(x) = qix) £ k[x \ and Rix) = rix) £ k[x]. Therefore, the list of 
equations occurring in the euclidean algorithm in K[x] is exactly the same list 
occurring in the euclidean algorithm in the smaller ring k[x], and so the same 
gcd is obtained in both polynomial rings. • 

Even though there are more divisors with complex coefficients, the gcd of 
x 3 — x 2 + x — 1 and x 4 — 1 is x 2 + 1, whether computed in M[x] or in C[x]. 

We have seen, when k is a field, that there are many analogs for /c[x | of 
theorems proved for Z. The essential reason for this is that both rings are PID’s. 
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Euclidean Rings 

There are rings other than Z and k[x ], where k is a field, that have a division 
algorithm. In particular, we present an example of such a ring in which the 
quotient and remainder are not unique. We begin by generalizing a property 
shared by both Z and k[x \. 

Definition. A commutative ring R is a euclidean ring if it is a domain and 
there is a function 

3: R x -* N 

(where R x denotes the nonzero elements of R), called a degree function, such 
that 

(i) 3(/)<3(/s)forall/,gefl x ; 

(ii) for all f,geR with / e R x , there exist q,r e R with 

g = qf + r, 

and either r = 0 or 3 (r) < 3 (/). 

Example 3.76. 

(i) Every field R is a euclidean ring with degree function 3 identically 0: if 
g e R with / e R x , set q = /' -1 and r = 0. Conversely, if R is a domain 
for which the zero function 3 : R x — »■ N is a degree function, then R is a 
field. If / e R x , then there ar e q,r e R with 1 = qf + r. If r ^ 0, then 
3(r) < 3 (/) = 0, a contradiction. Hence, r = 0 and 1 = qf , so that / is 
a unit; therefore, R is a field. 

(ii) The domain Z is a euclidean ring with degree function 3 (m) = \m\. In Z, 
we have 

3 (mn) = \mn\ = |m||n| = 3(m)3(n). 

(iii) When k is a field, the domain k[x] is a euclidean ring with degree function 
the usual degree of a nonzero polynomial. In k[x], we have 

3 (fg) = deg (fg) 

= deg (/) + deg(g) 

= 3(/) + 3(s) 

> 3(/). 

Certain properties of a particular degree function may not be enjoyed by 
all degree functions. For example, the degree function in Z in part (ii) is 
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multiplicative: 3 (mn) = d(m)d(n), while the degree function on k[x | here 
is not multiplicative. If a degree function 3 is multiplicative, that is, if 

d(fg) = 3(/)3(s), 

then 3 is called a euclidean norm. A 
Example 3.77. 

The Gaussian 12 integers Z [/] form a euclidean ring whose degree function 

3 (a T bi ) = a — b 


is a a euclidean norm. 

To see that 3 is a multiplicative degree function, note first that if a = 
a + bi, then 

3(a) = aa, 

where a = a — bi is the complex conjugate of a. It follows that 3 (af J >) = 
3(a)3(/3) for all a, [5 e Z [i], because 

3(a/3) = a/3a/3 = afta/S = otaP/3 = 3(a)3(/3); 

indeed, this is even true for all a, /3 e Q[i] = {x + yi : x, y € Q}, by Corol- 
lary 1 . 20 . 

We now show that 3 satisfies the first property of a degree function. If 
/! = c + id e Z [/] and fi ^ 0, then 

1 < 306 ), 

for 3 ( ( J > ) = c 2 +d 2 is a positive integer; it follows that if a. /J e Z [/] and j J > ^ 0, 
then 

3(a) < d(a)d(P) = d(a/3). 

Let us show that 3 also satisfies the second desired property. Given a, /3 e 
Z [; | with /I 7 ^ 0 , regard a//; as an element of C. Rationalizing the denominator 
gives a//3 = afi/ = a/3/d(P), so that 

a/p = x + yi, 

where x, y e <Q. Write x = m + u and y = n + v, where m, n e Z are integers 
closest to v and y, respectively; thus, \u\, |u| < (If v or y has the form m + j, 
where m is an integer, then there is a choice of nearest integer: x = m + \ or 

12 The Gaussian integers are so called because Gauss tacitly used Z[i] and its euclidean 
norm 3 to investigate biquadratic residues. 
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x = (m + 1) — a similar choice arises if x or y has the form m — j.) It follows 
that 

a = ft(m + ni) + ft(u + vi). 

Notice that ft(u + vi) e Z[i], for it is equal to a — ft (m + ni). Finally, we 
have d(ft(u + vi)) = d(ft)d(u + vi), and so 9 will be a degree function if 
9(w + vi) < 1. And this is so, for the inequalities u < ^ and u < | give 
u~ < j and v 2 < and hence d(u + vi) = u 2 + v 2 < 5 + 5 = \ < 1 . 
Therefore, + vi)) < <Hft), and so Z [/] is a euclidean ring whose degree 

function is a euclidean norm. 

The ring Z [/ ] of Gaussian integers is a euclidean ring in which quotients and 
remainders may not be unique . 1 - 1 For example, let a = 3 + 5/ and ft = 2. Then 
a/ ft = | + ^ 1 ; the choices are: 

m = 1 and u = \ or m = 2 and u = — 
n = 2 and v = ^ or n = 3 and v = — ^ . 

There are four quotients after dividing 3 + 5 i by 2 in Z [/], and each of the 
remainders (e.g., 1 + i ) has degree 2 < 4 = 9(2): 

3 + 5/ = 2(1 + 2i ) + (1 + i ) ; 

= 2(1 + 3i) + (1 — /); 

= 2(2 + 2 i) + (— 1 + i ); 

= 2(2 + 3/) + (-l -i). ◄ 


Proposition 3.78. Every euclidean ring R is a PID. In particular, the ring Z [/ ] 
of Gaussian integers is a PID. 

Proof. The reader can adapt the proof of Proposition 3.59: if / is a nonzero 
ideal in R, then / = (d), where d is an element in / of least degree. • 

The converse of Proposition 3.78 is false: there are PID’s that are not eu- 
clidean rings, as we will see in the next example. 

13 Note the equations in Z: 


3 = 1-2+1; 
3 = 2-2- 1. 


Now| — 1| = |1| < 1 2 1, so that quotients and remainders in Z are not unique! In Theorem 1.29, 
we forced uniqueness by demanding that remainders be non-negative. 
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Example 3.79. 

It is shown in algebraic number theory (see the remark on page 274) that the ring 

R = [a + b a : a, b £ Z}, 

where a = ^(1 + >/ — 19), is a PID [7? is the ring of algebraic integers in the 
quadratic number held Q(\/— 19)]. In 1949, T. S. Motzkin showed that R is not 
a euclidean ring. To do this, he found the following property of euclidean rings 
that does not mention its degree function. 


Definition. An element u in a domain R is a universal side divisor if u is not a 
unit and, for every x £ R, either u \ x or there is a unit z £ R with u \ (x + z). 


Proposition 3.80. If R is a euclidean ring that is not a field, then R has a 
universal side divisor. 

Proof Define 

S = {3(i>) : w /0 and v is not a unit}, 

where 3 is the degree function on R. Since R is not a field, there is some v £ R x 
which is not a unit, and so 5 is a nonempty subset of the natural numbers. By the 
Least Integer Axiom, there is a nonunit u £ R x with 3 (u) the smallest element 
of S. We claim that u is a universal side divisor. If a £ R. there are elements q 
and r with x = qu + r, where either r = 0 or 'd(r) < 3(w). If r = 0, then u \ x; 
if r 0, then r must be a unit, otherwise its existence contradicts 3 (u) being the 
smallest number in S. We have shown that u is a universal side divisor. • 

Motzkin then showed that if a = ^(1 + V— 19), then the ring 

[a + ba : a , b £ 7L } 

has no universal side divisors, concluding that this PID is not a euclidean ring. 
For details, we refer the reader to K. S. Williams, “Note on Non-euclidean Prin- 
cipal Ideal Domains,” Math. Mag. 48 (1975), 176-177 . ◄ 


Remark. If it is an irreducible element in a PID R, then every divisor of n is 
either a unit or an associate un, where u £ R is a unit. It follows that if f> £ R 
and it j f, then 1 is a gcd of Jt and ft. See Proposition 1.31. ◄ 


The following result holds for every PID; we will soon apply it to Z [; |. 
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Proposition 3.81. Let R be a PID. 

(i) Each a. f e R has a gcd, 8, which is a linear combination of ot and f> : 
there are o, r € R such that 

8 = aa + xf. 

(ii) If an irreducible element n € R divides a product aft, then either n \ a or 

IT | 0. 

Proof. 

(i) We may assume that at least one of a and f> is not zero (otherwise, the gcd is 
0 and the result is obvious). Consider the set / of all the linear combinations: 

1 = {aa + xf : a. x in R}. 

Now a and f> arc in / (take a = 1 and r = 0 or vice versa). It is easy to check 
that I is an ideal in R, and so there is 8 e / with I = ( 8 ), because R is a PID; 
we claim that 8 is a gcd of a and f. 

Since a e I = (8), we have a = p8 for some p e R ', that is, 8 is a divisor 
of a; similarly, 8 is a divisor of p, and so 5 is a common divisor of a and p. 
Since 8 e I, it is a linear combination of a and f: there arc a.x e R with 

,5 = aa + xp. 

Finally, if y is any common divisor of a and p, then a = ya' and f> = yP', so 
that y divides 8, for 8 = a a + xf = y(aa' + xp'). We conclude that 8 is a gcd. 

(ii) If n | a, we are done. If n \ a, then the remark says that 1 is a gcd of ix and 
a. There are thus a, x e R with 1 = an + xa, and so 

P = anf + xaf. 

Since n \ af, it follows that n \ p, as desired. • 

If n is an odd number, then either n = 1 mod 4 or n = 3 mod 4. In par- 
ticular, the odd prime numbers are divided into two classes. Thus, 5, 13, 17 are 
congruent to 1 mod 4, for example, while 3, 7, 11 are congruent to 3 mod 4. 

Lemma 3.82. If p is a prime and p = 1 mod 4, then there is an integer m with 

m ~ = — 1 mod p. 

Proof. If G = (Fp ) x is the multiplicative group of nonzero elements in F ;) , 
then |G| = p — l =0 mod 4; that is, 4 is a divisor of | G \ . By Proposition 2.122, 
G contains a subgroup S of order 4. By Proposition 2.132, either S is cyclic or 
a 2 = 1 for all a e S. Since ¥ p is a field, however, it cannot contain 4 roots of the 
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quadratic polynomial x 2 — 1. Therefore, S is cyclic, say, S = ([m]), where [in] 
is the congruence class of in mod 4. Since [in] has order 4, we have [m 4 ] = [1], 
Moreover, [in 2 ] [1] (lest [in] have order < 2 < 4), and so [in 2 ] = [—1], for 

[— 1] is the unique element in S of order 2. Therefore, m 2 = — 1 mod p. • 

Theorem 3.83 (Fermat’s Two-Squares Theorem). 14 An odd prime p is a sum 

of two squares, 

p = a 2 + b 2 , 

where a and b are integers, if and only if p = 1 mod 4. 

Proof. For any integer a we have a = r mod 4, where r = 0, 1, 2 or 3, and so 
a 2 = r 2 mod 4. But, mod 4, 

0 2 s0, l 2 = 1, 2 2 = 4 = 0, and 3 2 = 9 = 1 , 

so that a 2 = 0 or 1 mod 4. It follows, for any integers a and b, that a 2 + b 2 f 
3 mod 4. Therefore, if p = a 2 + b 2 , where a and b are integers, then p f 
3 mod 4. Since p is odd, either p = 1 mod 4 or p = 3 mod 4. We have just 
ruled out the latter possibility, and so p = 1 mod 4. 

Conversely, assume that p = 1 mod 4. By the lemma, there is an integer m 
such that 

p | ( m 2 + 1). 

In Z [i], there is a factorization m 2 + 1 = (in + i)(m — i), and so 
p | (m + i ) (m — i ) in Z [i ] . 

If p | (m + i ) in Z [/], then there are integers u and v with m + i = p(u + iv). 
Taking complex conjugates, we have m — i = p(u — iv), so that p \ (in — i ) as 
well. Therefore, p divides (in + i) — (m — i ) = 2 i, which is a contradiction, for 
3 ( p ) = p 2 > 3(2 i) = 4. We conclude that p is not an irreducible element, for it 
does not satisfy the analog of Euclid’s lemma in Proposition 3.81. Since Z [/] is 
a PID, there is a factorization 

p = af in Z [/] 

in which neither a = a + ib nor ft = c + id is a unit. Therefore, taking norms 
gives an equation in Z : 

p 2 = d(p) 

= d(afi) 

= d(a)d(J3) 

= (a 2 + b 2 )(c 2 + d 2 ). 

14 Fermat was the first to state this theorem, but the first published proof is due to Euler. 
Gauss proved that there is only one pair of numbers a and b with p — a 2 + b 2 . 
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Since a 2 + b 2 ^ 1 and c 2 + d 2 1, Euclid’s lemma gives p | ( a 2 + b 2 ) and 
p | (c 2 + d 2 ), and these equations give p = a 2 + b 2 (and p = c 2 + d 2 ). • 


Exercises 

3.55 Find the gcd of x 2 — x — 2 and x 3 — lx + 6 in F5 [x], and express it as a linear 
combination of them. 

3.56 If R is a domain and /(x) g /?[x] has degree n, show that /(x) has at most n roots 
in R. 

3.57 Let R be an arbitrary commutative ring. If /(x) g I?[x] and if a G R is a root 
of /(x), i.e., f(a) = 0, prove that there is a factorization fix) = (x — a)g(x ) 

in R[x], 

3.58 (i) Show that the following pseudocode implements the euclidean algorithm 

fi nding gcd /(x) and g(x) in fc[x], where k is a fi eld. 

Input: g, f 
Output: d 
d := g\ s := f 
WHILE s^ODO 

rem := remainder^, s) 
d := s 
s := rem 
END WHILE 

a := leading coeffi cient of d 
d := a~ l d 

(ii) Find (/, g), where f(x) = x 2 + 1, g(x) = x 3 + x + 1 e I3LV]. 

3.59 Prove the converse of Euclid’s lemma. Let k be a field and let /(x) e k[x\ be a 
polynomial of degree > 1 ; if, whenever / (x ) divides a product of two polynomials, 
it necessarily divides one of the factors, then /(x) is irreducible. 

3.60 Let /(x), g(x) e I?[x], where R is a domain. If the leading coeffi cient of /(x) is 
a unit in R, then the division algorithm gives a quotient q{x) and a remainder r(x) 
after dividing g(x) by fix). Prove that q(x) and r(x) are uniquely determined by 
g(x) and f(x). 

3.61 Let k be a field, and let fix), gix) e k\x | be relatively prime. If h)x) G k[x\, 
prove that fix) \ h(x) andg(x) | h(x) imply /(x)g(x) | h)x). 

*3.62 If k is a fi eld in which 1 + 1 0, prove thatVl — x 2 ^ k(x), where k{x) is the 

fi eld of rational functions. 

*3.63 Let fix) = (x — a\) • • • (x — a„) G /?[x], where R is a commutative ring. Show 
that fix) has no repeated roots (that is, all the a,- are distinct) if and only if the 
gcd if, /') = 1, where f is the derivative of /. 

3.64 Let 3 be the degree function of a euclidean ring R. If m, bgN and m > 1, prove 
that d' is also a degree function on R, where 


d'ix) = md(x) + n 
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for all x G R. Conclude that a euclidean ring may have no elements of degree 0 or 
degree 1. 

3.65 Let R be a euclidean ring with degree function 3. 

(i) Prove that 3(1) < 3(a) for all nonzero a G R. 

(ii) Prove that a nonzero u e R is a unit if and only if 3 (m) = 3(1). 

3.66 If a = i (l + s/~ 19), prove that the only units in R = {a + ba : a, b G Z} are 
±1. 

3.67 Let R be a euclidean ring with degree function 3, and assume that b G R is neither 
zero nor a unit. Prove, for every i > 0, that 3 (b') < 3 (£>' +1 ). 

*3.68 (i) If k is a fi eld, prove that the ring of formal power series k[[x]] is a PID. 

(ii) Prove that every nonzero ideal in fc[[x]] is equal to ( x" ) for some n > 0. 

3.69 Let k be a fi eld. and let polynomials oi (x), ai (x), . . . , a„ (x) in k\x\ be given. 

(i) Show that the greatest common divisor d(x) of these polynomials has the 
form ^ t; (x)cii (x), where r,(x) G k\x | for 1 < i < n. 

(ii) Prove that if c(x) is a monic common divisor of these polynomials, then 
c(x) | d(x). 

3.70 Let [fix), g(x)] denote the 1cm of fix), g(x) G k[x ], where k is a field. Show 
that if f(x)gix) is monic, then 

\ f, g](f, g ) = fg- 

*3.71 If k is a fi eld, show that the ideal (x, y) in k[x, y] is not a principal ideal. 

3.72 For every m > 1, prove that every ideal in I m is a principal ideal. (If m is compos- 
ite, then I m is not a PID because it is not a domain.) 

3.73 (i) Show that x, y G k[x, v] are relatively prime, but that 1 is not a linear 

combination of them, i.e., there do not exist s(x, y), t(x, y) G k[x, y] 
with 1 = xs(x, y) + yf (x, y). 

(ii) Show that 2 and x are relatively prime in Z[x], but that 1 is not a linear 
combination of them; that is, there do not exist j(x), t(x) G Z[x] with 
1 = 2j(x) + xt(x). 

3.74 Because x — 1 = (V* + l)(\/x — 1), a student claims that x — 1 is not irreducible. 
Explain the error of his ways. 

*3.75 Prove that there are domains R containing a pair of elements having no gcd. (See 
the defi nition of gcd in general domains on page 256.) 


3.6 Unique Factorization 

Here is the analog for polynomials of the fundamental theorem of arithmetic; it 
shows that irreducible polynomials are “building blocks” of arbitrary polynomi- 
als in the same sense that primes are building blocks of arbitrary integers. To 
avoid long sentences, let us agree that a “product” may have only one factor. 
Thus, when we say that a polynomial fix) is a product of irreducibles, we al- 
low the possibility that the product has only one factor, that is, that fix) itself is 
irreducible. 
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Theorem 3.84 (Unique Factorization). If k is a field, then every polyno- 
mial f(x) £ k[x] of degree > I is a product of a nonzero constant and monic 
irreducibles. Moreover, if 

fix) = api(x) ■ ■ -p m (x) and f (x) = bq\(x) ■ ■ ■ q n {x) , 

where a and b are nonzero constants and the p ’s and q ’s are monic irreducibles, 
then a = b, m = n, and the q ’s may be reindexed so that qj = pj for all i. 

Proof. The existence of a factorization of a polynomial f{x) £ k[x \ into irre- 
ducibles was proved in Proposition 3.66, and so we need only prove the unique- 
ness assertion. 

An equation f(x) = a p i {x ) ■ ■ ■ p m (x) gives a the leading coefficient of 
fix), since a product of monic polynomials is monic. Hence, two factorizations 
of fix) give a = b, for each is equal to the leading coefficient of fix). It now 
suffices to prove uniqueness when 


Plix) ■ ■■ p m ix) = qiix) • • • q„ix ). 

The proof is by induction on M = maxjm , n ) > 1 . The base step M = 1 
is obviously true, for the given equation is p\ix) = qiix). For the inductive 
step, the given equation shows that p m ix) \ qiix) ■ ■ ■ q„ix). By Theorem 3.68, 
Euclid’s lemma for polynomials, there is some i with p m ix) \ qiix). But qiix), 
being monic irreducible, has no monic divisors other than 1 and itself, so that 
qiix) = p m ix). Reindexing, we may assume that q N (x) = p m ix). Canceling 
this factor, we have piix) ■ ■ ■ p m - i(x) = qiix) ■ ■ ■ q„-i (x). By the inductive 
hypothesis, m — 1 = n — 1 (hence m = n) and, after possible reindexing, qt = pt 
for all i . • 

Example 3.85. 

The reader may check, in IztCv], that 

x 2 — 1 = ix - 1 )(x + 1) = (x - 3)(x + 3); 

each of the linear factors is irreducible (of course, I 4 is not a field). Therefore, 
unique factorization does not hold in l 4 [x]. A 

Let k be a held, and assume that fix) e k\x \ splits', that is, there are 
a, ri, . . . , r n e k with 

n 

fix) = a H(x - n). 
i= 1 

If r\, . . . , r s , where s < n, are the distinct roots of f(x), then collecting terms 
gives 

fix) = aix - rif'ix - r 2 ) e2 • • • (x - r s ) es , 
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where ej > 1 for all j . We call ej the multiplicity of the root r j. As linear poly- 
nomials are always irreducible, unique factorization shows that multiplicities of 
roots are well-defined. 

Remark. A domain R is called a UFD, unique factorization domain, if every 
nonzero nonunit r e R is a product of irreducibles and, moreover, such a fac- 
torization of r is essentially unique. The domains Z[f p ] in Example 3.10, where 
f = e- ni l p for p an odd prime, are quite interesting in this regard. Positive 
integers a, b, c for which 

a 2 + b 2 = c 2 , 

for example, 3, 4, 5 and 5, 12, 13, are called Pythagorean triples, and they have 
been recognized for four thousand years (a Babylonian tablet of roughly this 
age has been found containing a dozen of them), and they were classified by 
Diophantus about two thousand years ago. Around 1637, Fermat wrote in the 
margin of his copy of a book by Diophantus what is nowadays called Fermat’s 
Last Theorem : for all integers n > 3, there do not exist positive integers a, b, c 
for which 

a” +b n = c n . 

Fermat claimed that he had a wonderful proof of this result, but that the margin 
was too small to contain it. Elsewhere, he did prove this result for n = 4 and, 
later, others proved it for small values of n. However, the general statement 
challenged mathematicians for centuries. 

Call a positive integer n > 2 good if there are no positive integers a,b,c 
with a n + b n = c n . If n is good, then so is any multiple nk of it. Otherwise, 
there are positive integers r, s, t with r" k +s nk = t nk ; this gives the contradiction 
a n + b n = c", where a = r k ,b = s k , and c = t k . For example, any integer of 
the form 4k is good. Since every positive integer is a product of primes, Fermat’s 
Last Theorem would follow if every odd prime is good. 

As in Exercise 3.78 on page 278, a solution a p + b p = c p , for p an odd 
prime, gives a factorization 

c p = (a + b)(a + £b)(a + f 2 b) ■ ■ ■ (a + f p l b), 

where f = t; p = e 2ni l p . In the 1840s, E. Kummer considered this factorization 
in the domain Z[£ p ] (described in Example 3.10 on page 222). He proved that 
if unique factorization into irreducibles holds in Z[f p ], then there do not exist 
positive integers a, b, c (none of which is divisible by p) with a p + b p = c p . 
Kummer realized, however, that even though unique factorization does hold in 
Z[£ p ] for some primes p, it does not hold in all Z[f ;) ]. To extend his proof, 
he invented what he called “ideal numbers,” and he proved that there is unique 
factorization of ideal numbers as products of “prime ideal numbers.” These ideal 
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numbers motivated R. Dedekind to define ideals in arbitrary commutative rings 
(our definition of ideal is that of Dedekind), and he proved that ideals in the 
special rings Z [fp] correspond to Rummer’s ideal numbers. Over the years these 
investigations have been vastly developed, and at last, in 1995, A. Wiles proved 
Fermat’s Last Theorem. ◄ 

There are formulas for gcd’s and lcm’s of two polynomials in k[x]. 

Proposition 3.86. Let k be a field and let g (x ) = ap\ l ■ ■ • //" e k [x | and let 

h(x) = bp{' • • • pj" € k\x\ where a, b £ k, the p, are distinct monic irreducible 
polynomials, and et, fi > 0 for all i. Define 

mi = min{e,-, /,} and Mi = max{e,-, /,}. 

Then 

(g, h) = p" n ■ ■ ■ p™ n and [g, h ] = pf 1 • • • pff" . 

Proof. The proof is a just an adaptation of the proof of Proposition 1.52. • 

The next result is an analog of Proposition 1.44: for b > 2, every positive 
integer has an expression in base b. 

Lemma 3.87. Let k be a field, and let b(x) e k\x | have deg (b) > 1. Each 
nonzero f(x) £ k [x | has an expression 

fix) = d m {x)b{x) m 4 f dj(x)b(x) j H F do(x), 

where, for every j, either dj(x) = 0 or deg(dj) < deg (b). 

Proof By the division algorithm, there are g(x), dfix) e k[x \ with 

fix) = gix)bix) + d Q ix), 

where either doix) = 0 or dcgit/o) < deg(fi). Now deg (/) = deg igb), so that 
deg ib) > 1 gives deg(g) < deg (/). By induction, there are d } (x) e k[x ] with 
each djix) = 0 or deg (dj) < deg(fi), such that 

g(x) = d m b ,n *+••• + dzb + d\. 

Therefore, 

/ = gb + do = id, n b lu + • • • + dib + d\)b + do 
— d m b T * * * T dib d\b -\~ do- • 

Just as for integers, it can be proved that the “digits” di(x) are unique (see 
Proposition 1.44). 
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Definition. Polynomials qfix), . . . , q n (x) in k[x ], where k is a field, are pair- 
wise relatively prime if (q, . q t ) = 1 for all i / j . 

It is easy to see that if q\ (x), ... , q m (x ) are pairwise relatively prime, then 
qfix) and the product q 2 (x) • • • q m (x) are relatively prime. 


Lemma 3.88. Let k be afield, let fix)/g(x) e k(x), and suppose that g(x) = 
qfix) ■ ■ ■ q m (x ), where q fix), . . . , q m (x ) £ k[x ] are pairwise relatively prime. 
Then there are a fix) £ k[x ] with 


fix) _ a/(x) 

gO) “ di (x)' 


Proof. The proof is by induction on m > 1 . The base step m = 1 is clearly 
true. Since q\ and <72 ■ • ■ dm are relatively prime, there are polynomials s and t 
with 1 = sq\ + tqi ■ ■ ■ q m . Therefore, 

/ / 

— = (s<7i +tq 2 • • • q m ) — 

g g 

- sqi f + tq 2 1 

g g 

_ sqif tq 2 • ■■q ln f 

* ' ' Qm Q\Q2 * * * Qm 

= ~L + r J- 

qi ' ' ' q m q\ 

The polynomials q 2 (x ), . . . , q m (x ) are pairwise relatively prime, and the induc- 
tive hypothesis now rewrites the first summand. • 

We now prove the algebraic portion of the method of partial fractions used 
in calculus to integrate rational functions. 


Theorem 3.89 (Partial Fractions). Let k be afield, and let the factorization 
into irreducibles of a monic polynomial g(x) € k[x \ be 


g(x) = pi(x) ei ■ ■ ■ p m (x) e ' n 


If fix)/ g{x) £ k(x), then 


fix) _ . tA / djfix) d i2 jx) die fix) \ 

gix) lX ~{\Pii X ) Piix ) 2 Piix) e > ) ' 


where h(x) £ k[x) and either djjix) = 0 ordeg(djj) < deg [pfifor all j. 
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Proof. Clearly, the polynomials pi(x) ei , p2(x) e2 , . . . , p m (x) em are pairwise 
relatively prime. By Lemma 3.88, there are a -, (x) e k[x] with 

fix) _ ajjx) 
g(x) f^PiixY*' 

For each i, the division algorithm gives polynomials Q , (x ) and P, (x ) with 
aiix) = Qi(x)pi(x) ei +Riix), where either R t {x) = Oordeg(/?,-) < deg (pi(x) ei ). 
Hence, 

aiix) Riix) 

— Q i ) H - • 

Pi(x) e ‘ Piix) e > 


By Lemma 3.87, 

Riix) = dj m (x) pi (x) m + di, m -iix)piix)'"~ l H h dioix). 


where, for all j, either djj(x) = 0 or deg (djj) < deg (/?,•); moreover, since 
deg (Ri) < deg (pf), we have m < e/. Therefore, 

aiix) d im ix)piix) m +di^ m -iix)piix) m ~ x -\ \~ d i0 (x) 

= Qiix) H 

Pi(x) e > Piix) ei 

, dj m jx)piix) m | dj, m - l ix)pijx) m ~ l | , diojx) 

Piix) e > Piix) e ‘ Piix) e >' 

After cancellation, each of the summands djj (a) p\ ix) J / /?, (x) ei is either a poly- 
nomial or a rational function of the form d;j (x)/ pi (x) s , where 1 < .y 5 e,. If we 
call 1 1 (x) the sum of all those polynomials which are not rational functions, then 
this is the desired expression. • 

It is known that the only irreducible polynomials in M[x] are linear or quad- 
ratic, so that all the numerators in the partial fraction decomposition in M[x] 
are either constant or linear. Theorem 3.89 is used in proving that all rational 
functions in M(x) can be integrated in closed form using logs and arctans. 

Here is the integer version of partial fractions. If a/b is a positive rational 
number and if the prime factorization of b is b = pj 1 • • • then 


a 

b 


b + J2 



where h eZ and 0 ^ C U < pi for all j . 
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Exercises 

3.76 In k\x\, where k is a fi eld, let / = fj 1 ■ ■ ■ p e " and g = p j 1 • • • p s " , where the pj ’ s 
are distinct monic irreducibles and e, , e,- > 0 for all i (as with integers, the device 
of allowing zero exponents allows us to have the same irreducible factors in the 
two factorizations). Prove that / | g if and only if e ; - < £,• for all i. 

*3.77 (i) If f(x) G R[x], show that f(x ) has no repeated roots in C if and only if 

(/,/')=!• 

(ii) Prove that if p(x) e Q|x | is an irreducible polynomial, then p(x) has no 
repeated roots. 

*3.78 Lett: = e 2ni ' n . 

(i) Prove that 

x n -l = (x- \)(x - f )(* - c 2 ) • • • (X - f"- 1 ) 
and, if n is odd, that 

x n + i = (x + i)(x + t:)(x + ^^)***(x + 1 : n i). 

(ii) For numbers a and b, prove that 

a" — b n = (a — b)(a — £b)(a — f 2 h) • • • (a — 
and, if n is odd, that 

a n + b " = (a + b)(a + t;b)(a + t; 2 b) ■ ■ ■ (a + t; n ~ l b). 

3.79 If R is a PID, prove that R is a unique factorization domain. 


3.7 IRREDUCIBILITY 

Although there are some techniques to help decide whether an integer is prime, 
the general problem of factoring (large) integers is a very difficult one. It is 
also very difficult to determine whether a polynomial is irreducible, but we now 
present some useful techniques that frequently work. 

We know that if f(x) e k[x \ and r is a root of f(x) in a field k, then there 
is a factorization f(x) = (x — r)g(x) in k[x], and so / (x ) is not irreducible. 
In Corollary 3.65, we saw that this decides the matter for quadratic and cubic 
polynomials in k\x\. such polynomials are irreducible in k[x \ if and only if they 
have no roots in k. 

Theorem 3.90. Let f(x) = «o + a\x + • • • + a n x n e Z [x] c Q[x]. Every 
rational root r of f(x) has the form r = h/c, where b \ oq a nd c \ a„. 
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Proof. We may assume that r = b/c is in lowest terms, that is, ( b , c) = 1. 
Substituting r into / (x) gives 

0 = / (b/c) = cio + a\b/c + • • • + a n b n /c " , 
and multiplying through by c n gives 

0 — do c T a\bc T * * * T a n b . 

Hence, oq c n = b{—a\c n ~ l — ■ •• — a n b n ~ l ), that is, b \ aoc n . Since b and c 
are relatively prime, it follows that b and c n are relatively prime, and so Euclid’s 
lemma in Z gives b \ ao . Similarly, a n b n = c(—a n -\b n ~ { — ... — aoc n ~ l ), 
c | a n b n , and c \ a n . • 

Definition. A complex number a is called an algebraic integer if a is a root of 
a monic polynomial /(x) e Z [x \ . 

We note that it is crucial, in the definition of algebraic integer, that /(x) e 
Z [x\ be monic. Every algebraic number z, that is, every complex number z 
that is a root of some polynomial g(x) e Q[x ], is necessarily a root of some 
polynomial /z(x) e Z [x]; just clear the denominators of the coefficients of g(x). 

Corollary 3.91. A rational number z that is an algebraic integer must lie in 
Z. More precisely, if f(x) £ Z[x] C Q[x] is a monic polynomial, then every 
rational root of f(x) is an integer that divides the constant term. 

Proof If f(x) = ao+a\x-\ \-a n x n is monic, then a n = 1, and Theorem 3.90 

applies at once. • 

For example, consider f(x) = x 3 +4x 2 — 2x— 1 e Q[x]. By Corollary 3.65, 
this cubic is irreducible if and only if it has no rational root. As f(x) is monic, 
the candidates for rational roots are ±1, for these are the only divisors of —1 in 
Z. But f(l ) = 2 and /(— 1) = 4, so that neither 1 nor —1 is a root. Thus, f(x ) 
has no roots in Q, and hence /(x) is irreducible in Q[x]. 

This corollary gives a new solution of Exercise 1.64 on page 56. If m is an 
integer that is not a perfect square, then the polynomial x 2 — m has no integer 
roots, and so flit is irrational. Indeed, the reader can now generalize to nth roots: 
if m is not an nth power of an integer, then ifTn is irrational, for any rational root 
of x n — m must be an integer. 

The next proposition gives a way of proving that a complex number is an 
algebraic integer; moreover, it shows that the sum and product of algebraic inte- 
gers is another such (if a and ft are algebraic integers, then it is not too difficult 
to give monic polynomials having a + f and a/3 as roots, but it takes a bit of 
work to find such polynomials having all coefficients in Z). 
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Let a e C. If Z [o'] is the smallest subring of C containing a, then Exer- 
cise 3.32 on page 239 shows that 

Z [a] = {/(a) : fix) € Z [x]}. 

We have seen Z [o'] in the special case a = £„ (in Example 3.10). 

Proposition 3.92. 

(i) If a e C then a is an algebraic integer if and only if Z [a] contains a 
finite subset x\, ... ,x n such that every f e Z [o'] is a linear combination 
ft = nijXi with all m; e Z. 

(ii) I'he set of all algebraic in tegers is a subring of C. 

Remark. In Exercise 2.89 on page 187, we defined fin itely generated abelian 
groups, and said that every subgroup of a finitely generated abelian group is itself 
finitely generated. Forgetting its multiplication, every ring is an abelian group 
under addition, and the condition in part (i) says that the additive group of Z [o'] 
is finitely generated. ◄ 

Proof. 

(i) If cr is an algebraic integer, then there is a monic f (x ) = x" + b n - \x n ~ l + 
• • • + bo e Z [x ] having a as a root. Let 

G = [m n - ia” -1 + • • • + mioT + m\oi + mo ■ 1 : m, e Z}. 

Clearly, G C Z [o']. For the reverse inclusion, note that o'* e G for all k < n. 
We now show that a k e G for all k > n by induction on k. If k = n, then 
a n = —(b n -ia n ~ l + ■ ■ ■ + b\a + bo) £ G (notice how we have used the fact 
that fix) is monic). For the inductive step, assume that there are integers c,- so 
that a k = c„- iQ'" -1 + • • • + cia + cq. Then 


a k+i = aa k 


= c n -\a" + c n -20/" 1 + • 

2 

• • + ClOC + CoQ! 

= G-l [ (bn— l a T ■ ■ 

• + b 2 a 2 + b\a + bo)\ 

+ C n —20/ U * + • • 

■ + CIO' 2 + coot. 


which lies in G. By Exercise 3.32 on page 239, if ft e Z [o'], then f = f{a) for 
some f(x) e Z [x], and so f e G. 

Conversely, suppose that Z [o'] consists of all the Z -linear combinations of 
elements xi, . . ., x, 5 ; in particular, each xj = aj\a . 1 , where all aji e Z. 
Let M be the largest power of a occurring in any of these expressions. Now 
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a M+ 1 e Z [a], so that it can be expressed as a Z-linear combination of smaller 
powers of a; say, a M+1 = YlkL o bk& k , where all b k e Z. Therefore, a is a root 
of /(x ) = x M+1 — which is a monic polynomial in Z [a], and so a 

is an algebraic integer. 

(ii) Suppose that a and /I are algebraic integers; let a be a root of a monic / (x) e 
Z [x] of degree n, and let f J > be a root of a monic g(x) e Z [x] of degree m. The 
additive group A of C generated by all a} f}i , where 0 < i < n and 0 < j < 
m, is a finitely generated abelian subgroup containing Z \af> | and Z [a + /S] as 
subgroups. By Exercise 2.89 on page 187, every subgroup of a finitely generated 
abelian group is finitely generated, and so both a/3 and a + f are algebraic 
integers, by part (i). • 

We are now going to find several conditions that imply that a polynomial 
fix) e Z[x] does not factor in Z[x] as a product of polynomials of smaller 
degree. Since Z is not a field, this does not force /(x) be irreducible in Z [x]; for 
example, /(x) = 2x +2 does not so factor in Z [x], yet it is not irreducible there. 
However, C. F. Gauss (1777-1855) proved that if /(x) e Z [x] does not factor 
as a product of polynomials in Z [x] of smaller degree, then /(x) is irreducible 
in Q[x ]. We prove this result after first proving several lemmas. 

Definition. A polynomial /(x) = ao + a\x + aix 2 + • • • + a n x n e Z [x] is 
called primitive if the gcd of its coefficients is 1 . 

Of course, every monic polynomial in Z[x] is primitive. It is easy to see 
that if d is the gcd of the coefficients of /(x), then ( I /d) f(x) is a primitive 
polynomial in Z [x]. 

Observe that if /(x) is not primitive, then there exists a prime p that divides 
each of its coefficients: if the gcd is d > 1, then take for p any prime divisor 
of d. 

Lemma 3.93 (Gauss’s Lemma). If f(x), g(x) e Z[x] are both primitive, 
then their product f(x)g(x) is also primitive. 

Proof. Let /(x) = Y, a i x ^ <?(•*) = I ZbjX j , and /(x)g(x) = Jfc k x k . If 
f(x)g(x) is not primitive, then there is a prime p that divides every c k . Since 
fix) is primitive, at least one of its coefficients is not divisible by p\ let a,- be 
the first such (i.e., i is the smallest index of such a coefficient). Similarly, let 
bj be the first coefficient of g (x ) that is not divisible by p. The definition of 
multiplication of polynomials gives 

aib j = a+j - (aobi+j H h ai-ibj+i + a i+ ibj-\ H f a i+j bo). 

Each term on the right side is divisible by p, and so p divides «,7; ; . As p divides 
neither ci[ nor b j. however, this contradicts Euclid’s lemma in Z. • 
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Here is a more elegant proof of this last lemma; after all, the hypothesis that 
a polynomial h(x) e Z [x ] is not primitive says that all its coefficients are 0 in 
F p for some prime p. Assume that the product f(x)g(x) is not primitive, so 
there is some prime p dividing each of its coefficients. If q> : Z — »• F ; , is the 
natural map a \—> [a], then Exercise 3.44 on page 248 shows that the funcdon 
(p* : Z [x] — > F p [x], which reduces all the coefficients of a polynomial mod 
p, is a ring homomorphism. In particular, since p divides every coefficient of 
f(x)g(x), we have 0 = <p*(fg ) = <p*(.f)<p*(g ) in F p [x], On the other hand, 
neither nor <p*{g) is 0, because they are primitive. We have contradicted 

the fact that F p [x ] is a domain. 

Lemma 3.94. Every nonzero f(x) £ Q[x] has a unique factorization 

fix) = cif) fix), 

where c{f) £ Q is positive and f # (x) £ Z [a] is primitive. 

Proof There are integers at and b[ with 

fix) = ia 0 /b 0 ) + ia\/b\)x 4 f {a n /b n )x n e Q[x], 

Define B = bob\ . . ,b n , so that gix) = Bfix) £ Z[x]. Now define D = ±d, 
where d is the gcd of the coefficients of g (x ) : the sign is chosen to make the 
rational number D/B positive. Now iB/D)fix) = ( I / D)g(x) lies in Z[.v], and 
it is a primitive polynomial. Ifwedefinec(/) = D/B and / # (x) = iB/D)f(x), 
then fix) = c(/)/' # (x) is a desired factorization. 

Suppose that fix) = elfx) is a second such factorization, so that e is a pos- 
itive rational number and hix) e Z [x] is primitive. Now c(/)/ # (x) = fix) = 
eh/x), so that / # (x) = [e/c(/)]/7(x). Write e/df) in lowest terms: e/cif) 
= u/v, where u and v are relatively prime positive integers. The equation 
vf # (x) = uh(x) holds in Z[x]; equating like coefficients, v is a common di- 
visor of each coefficient of m/z(x). Since (u, v) = 1, Euclid’s lemma in Z shows 
that v is a (positive) common divisor of the coefficients of hix). Since h (x ) is 
primitive, it follows that v = 1. A similar argument shows that u = 1 . Since 
e/cif) = u/v = 1, we have e = cif) and hence hix) = f # ix). • 

Definition. The rational cif) in Lemma 3.94 is called the content of fix). 

Corollary 3.95. If fix) e Z [x], then cif) e Z. 

Proof. If d is the gcd of the coefficients of fix), then ( I /d)f(x) e Z[x] is 
primitive. Since J[(l/d)/'(x)] is a factorization of fix) as the product of a 
positive rational d (which is even a positive integer) and a primitive polynomial, 
the uniqueness in the lemma gives c(/) = deZ. • 
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Corollary 3.96. If f(x) e Q[x | factors as fix) = g(x)h(x), then 
c(f) = c(g)c(h ) and f # (x) = g # (x)h # (x). 
Proof We have 


fix) = g(x)h(x) 

c(f)f # (x) = [c(g)g # (x)][c(h)h # (x)] 

= c(g)c(h)g*(x)h # (x). 

Since g*(x)h # (x) is primitive, by Lemma 3.93, and c(g)c(h) is a positive ratio- 
nal, the uniqueness of the factorization in Lemma 3.94 gives c(f) = c(g)c(h) 
and f # (x) = g # (x)h # (x). • 

Theorem 3.97 (Gauss). Let fix) e Z [x |. If 

f(x) = G(x)H(x) in Q[x], 
then there is a factorization 

f(x) = g(x)h(x) in Z[x], 

where deg(g) = deg(G) and deg (/;) = deg (H). Therefore, if f(x) does not 
factor into polynomials of smaller degree in Z[x], then f(x) is irreducible in 

QM- 

Proof By Corollary 3.96, there is a factorization 

f(x) = c(G)c(H)G # (x)H # (x) in Q[x], 

where G # (x), H # (x) e Z[x] are primitive polynomials. But c(G)c(H) = c(f), 
by Corollary 3.96, and c(f) e Z, by Corollary 3.95 (which applies because 
f(x) e Zlxl). Therefore, f (x) = g(x)h(x) is a factorization in Zlxl, where 
g(x) = c(/)G # (x) and h(x) = H # (x). • 

Remark. Gauss used these ideas to prove Theorem 7.21: the ring k[x i, . . . , x„] 
of all polynomials in n variables with coefficients in a field A: is a unique factor- 
ization domain. ◄ 

The next criterion uses the integers mod p. 

Theorem 3.98. Let f(x) = «o + a\x + a 2 X 2 + • • • + x” e Z [x] be monic, 
and let p be a prime. If f*(x) = [co] + [a \ \x + [flo]-^ 2 + • • • + x" £ F /; [x ] is 
irreducible, then f(x) is irreducible in Q[x]. 
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Proof. By Exercise 3.44 on page 248, the natural map <p: Z — > Fp defines a 
homomorphism <p * : Z [x] — > Fp [x\ by 

(p*{bo + b\x + b 2 x 2 H ) = + [fi i ]x + [b 2 \x 2 H , 

that is, just reduce all the coefficients mod p. If g(x) £ Z[x], denote its 
image <p*(g(x)) £ F p [x] by g*(x). Suppose that f(x) factors in Z[x]; say, 
f(x) = g(x)h(x), where deg(g) < deg (/) and deg(/z) < deg (/) [of course, 
deg (/) = deg(g) + deg (fi)]. Now f*(x) = g*(x)h*(x), because cp* is a ring 
homomorphism, so that deg (/*) = deg(g*) + deg(/z*). Since /(x) is monic, 
f*(x) is also monic, and so deg (/*) = deg (/). (Of course, the hypothesis that 
fix) be monic can be relaxed; we may assume, instead, that p does not divide its 
leading coefficient.) Thus, both g*(x) and h*(x) have degrees less than deg (/*), 
contradicting the irreducibility of f*(x). Therefore, /(x) is irreducible in Z [x | . 
and, by Gauss’s theorem, fix) is irreducible in Q [x \ . • 

The converse of Theorem 3.98 is false; its criterion does not always work. It 
is not difficult to find an irreducible f(x) £ Z [x \ C Q[.r] with / (x) factoring 
mod p for some prime p, and Exercise 3.96 on page 304 shows that x 4 + 1 is an 
irreducible polynomial in <Q[.v] that factors in F f) [x] for every prime p. 

Theorem 3.98 says that if one can find a prime p with f*(x) irreducible in 
F p [x], then fix) is irreducible in Q[x|. Until now, the finite fields F /; have been 
oddities; ¥ p has appeared only as a curious artificial construct. Now the finite- 
ness of Fp is a genuine advantage, for there are only a finite number of polyno- 
mials in Fp [x \ of any given degree. In principle, then, one can test whether a 
polynomial of degree n in F p [x \ is irreducible by just looking at all the possible 
factorizations of it. 

Since it becomes tiresome not to do so, we are now going to write the ele- 
ments of Fp without brackets. 

Example 3.99. 

We determine the irreducible polynomials in F 2 [x ] of small degree. 

As always, the linear polynomials x and x + 1 are irreducible. 

There are four quadratics: x, x 2 + x; x 2 + 1; x 2 + x + 1 (more generally, 
there are p n monic polynomials of degree n in Fp [x], for there are p choices for 
each of the n coefficients ao, , a n _ 1 ). Since each of the first three has a root 
in F 2 , there is only one irreducible quadratic. 

There are eight cubics, of which four are reducible because their constant 
term is 0. The remaining polynomials are 

x 3 + 1; x 3 + x + 1; x 3 + x 2 + 1 ; x 3 +x 2 +x + l. 

Since 1 is a root of the first and fourth, the middle two are the only irreducible 
cubics. 
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There are 16 quartics, of which eight are reducible because their constant 
term is 0. Of the eight with nonzero constant term, those having an even num- 
ber of nonzero coefficients have 1 as a root. There are now only four surviving 
polynomials fix), and each of them has no roots in F2, i.e., they have no lin- 
ear factors. If fix) = g(x)h(x), then both g (x ) and h (x) must be irreducible 
quadratics. But there is only one irreducible quadratic, namely, x (i) 2 + x + 1, and 
so (x 2 + x + T) 2 = x 4 + x 2 + 1 is reducible while the other three quartics are 
irreducible. The following list summarizes these observations. 

Irreducible Polynomials of Low Degree over F 2 

degree 2: x 2 +x + l. 

degree 3: x 3 +x + l; x 3 +x 2 +l. 

degree 4: x 4 + x 3 +l; x 4 + x + l; x 4 +x 3 +x 2 +x + 1. ◄ 


Example 3.100. 

Here is a list of the monic irreducible quadratics and cubics in F3 [x]. The reader 
can verify that the list is correct by first enumerating all such polynomials; there 
are 6 monic quadratics having nonzero constant term, and there are 18 monic 
cubics having nonzero constant term. It must then be checked which of these 
have 1 or — 1 as a root (it is more convenient to write — 1 instead of 2). 


degree 2: 
degree 3: 


Monic Irreducible Quadratics and Cubics over F 3 


x 2 + 1; 

x 2 + x — 1; 

x 2 — x — 1. 

x 3 — x + 1; 

x 3 + x 2 — x + 1; 

x 3 - x 2 + 1 

x 3 — x 2 + x + 1; 

x 3 T x 2 — 1; 

x 3 — x 2 — 1 


x 3 +x 2 + x — 1; x 3 — x 2 — x— 1. ◄ 


Example 3.101. 


(i) We show that fix) = x 4 — 5x 3 + 2x + 3 is an irreducible polynomial in 
Q[x]. By Corollary 3.91, the only candidates for rational roots of /(x) are 
1, —1, 3, —3, and the reader may check that none of these is a root. Since 
fix) is a quartic, one cannot yet conclude that fix) is irreducible, for it 
might be a product of (irreducible) quadratics. 

Let us try the criterion of Theorem 3.98. Since /*(x) = x 4 + x 3 + 1 
in F? [x] is irreducible, by Example 3.99, it follows that fix) is irreducible 
in Q[x]. [It was not necessary to check that fix) has no rational roots; 
irreducibility of /*(x) is enough to conclude irreducibility of fix).] 

(ii) Let <&5(x) = x 4 + x 3 + x 2 + x + 1 e Q[x]. 

In Example 3.99, we saw that ($5)*(x) = x 4 + x 3 + x 2 + x + 1 is 
irreducible in F2[x], and so T^ix) is irreducible in Q[x]. ◄ 
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As any linear polynomial, <t> 2 (x) = x + 1 is irreducible in Q [a | : <I >3 (x ) = 
x 2 + x + 1 is irreducible in Q[x ] because it has no rational roots; we have just 
seen that O 5 (x) is irreducible in Q [x \ . Let us introduce another irreducibility 
criterion in order to prove that <L p (x ) is irreducible in Q[x] for all primes p. 

Lemma 3.102. Let g(x) e Z [x]. If there is c € Z with g(x + c) irreducible in 
Z [x], then g(x) is irreducible in Q[x]. 

Proof. By Proposition 3.33, the function ep : Z [x] — »• Z [. v ] , given by 

fix) fix +c), 

is an isomorphism. If g(x) = s(x)t(x), then g(x + c) = <p(g(x)) = ep(st) = 
ep(s)ep(t) is a forbidden factorization of g (x + c). Therefore, g (x ) is irreducible 
in Z [x] and hence, by Gauss’s theorem, g(x) is irreducible in Q[x]. • 

Theorem 3.103 (Eisenstein Criterion). 

Let fix) = no + cqx + • • • + a n x n e Z [x]. If there is a prime p dividing a, 
for all i < n but with p j a„ and p 2 \ a 0 , then fix) is irreducible in Q[x]. 

Proof Assume, on the contrary, that 

fix) = (bo + b\x H b b m x m )(co + cix H b Ckx k ), 

where m < n and k < /?; by Theorem 3.97, we may assume that both factors 
lie in Z [x]. Now p \ ao = boco, so that Euclid’s lemma in Z gives p \ bo 
or p | co; since p 2 j ao, only one of them is divisible by p, say, p \ c 0 but 
p \ bo- By hypothesis, the leading coefficient a n = b m Ck is not divisible by p, 
so that p does not divide eg (or b m ). Let c r be the first coefficient not divisible 
by p (so that p does divide co, . . . , c r _i). If r < n, then p \ a r , and so boc,- = 
a,- — ( b\c r -i + • • • + b r co) is also divisible by p. This contradicts Euclid’s 
lemma, for p \ boc r , but p divides neither factor. It follows that r = n ; hence 
n > k > r = n, and so k = n, contradicting k < n. Therefore, f(x) is 
irreducible in Q[x]. • 

Remark. R. Singer found the following elegant proof of Eisenstein’s criterion. 

Let ep* : 7L [x] — >• F /; [x] be the ring homomorphism that reduces coefficients 
mod p, and let /*(x) denote ep*(f(x)). If f(x) is not irreducible in Q[x], then 
Gauss’s theorem gives polynomials g(x), h (x ) e Z[x] with f(x) = g(x)h(x), 
where g(x) = bo + b\x + • • • + b m x m , h(x) = co + cix + • • • + CkX k , and 
m, k > 0. There is thus an equation f*(x) = g*(x)h*(x) in F p [x]. 

Since p j a n , we have f*(x) 0; in fact, /*(x) = ux n for some unit 

u eW p , because all its coefficients aside from its leading coefficient are 0. By 



IRREDUCIBILITY 287 


Theorem 3.84, unique factorization in Fp[x], we must have g * (x ) = vx m and 
h*(x) = wx k , where v and w are units in F p , because the only monic divisors 
of x" are powers of x. It follows that each of g*(x) and h * (x ) has constant term 
0; that is, [bo] = 0 = [co] in F p : equivalently, p \ bo and p \ co- But ao = boco, 
and so p 2 \ ao, a contradiction. Therefore, f(x) is irreducible in Q[x]. ◄ 

Definition. If p is a prime, then the p th cyclotomic polynomial is 

$>p(x) = (x p — l)/(x — 1) = x p ~ x + x p ~ 2 + • • • +x + 1. 

Corollary 3.104 (Gauss). For every prime p, the pth cyclotomic polynomial 
4>p(x) is irreducible in Q[x]. 

Remark. Every cyclotomic polynomial (x ) , for every (not necessarily prime) 
n > 1 (defined on page 29), is irreducible in Q[x] (see Tignol, Galois’ Theory 
of Algebraic Equations, Theorem 12.31). ◄ 

Proof. Since p (x) = (x p — l)/(x — T), we have 
<$> p (x + 1) = [(x + \) p - l]/x 



Since p is prime. Proposition 1.36 shows that Eisenstein’s criterion applies; we 
conclude that <& p (x + 1) is irreducible in Q[x]. By Lemma 3.102, O p (x ) is 
irreducible in Q[x]. • 

We do not say that x" -1 + x"” 2 + • • • + x + 1 is irreducible when n is not 
prime. For example, when n = 4, x 3 + x 2 + x + 1 = (x + 1 )(x 2 + 1). 


Exercises 

*3.80 Determine whether the following polynomials are irreducible in Q[x], 

(i) f(x) = 3x 2 — lx — 5. 

(ii) f(x) = 350x 3 — 25x 2 + 34x + 1. 

(iii) fix) = 2x 3 — x — 6. 

(iv) fix) = 8x 3 — 6x — 1. 

(v) f(x) = x 3 + 6x 2 + 5x + 25. 

(vi) f(x) = x 5 — 4x + 2. 

(vu) fix) = x 4 + x 2 + x + 1. 

(viii) f(x) = x 4 — 10x 2 + 1. 
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(ix) fix) = x 6 — 210a — 616. 

(x) f(x) = 350a 3 + x 2 + 4x + 1. 

3.81 If p is a prime, prove that there are exactly ^ (p 3 — p) monic irreducible cubic 
polynomials in F ; , |a| , 

3.82 Prove that there are exactly 6 irreducible quintics in F2 [x], 

3.83 (i) If a ±1 is a squarefree integer, show that x n — a is irreducible in Q[x] 

for every n > 1. Conclude that there are irreducible polynomials in Q[x] 
of every degree n > 1 . 

(ii) If a 5^ ±1 is a squarefree integer, prove that 1/fa is irrational. 

3.84 Let k be a fi eld, and let fix) = qj + a\x + ■ ■ ■ + a„x n e k\x | have degree n. If 
fix) is irreducible, then so is a n + a n -\x + • • • + a ox". 

*3.85 Let S be a subring of a commutative ring A. If both S and A/S are fi nitely gener- 
ated, prove that A is fi nitely generated. Prove that every subgroup S of a fi nitely 
generated abelian group A is fi nitely generated. 

*3.86 If a is an algebraic integer and pix) e Z[x] is the monic polynomial of least 
degree having a as a root, prove that pix) is irreducible in Q[x], One calls pix) 
the minimum polynomial of a. 


3.8 Quotient Rings and Finite Fields 

The fundamental theorem of algebra states that every nonconstant polynomial 
in C[x | is a product of linear polynomials in C [x ] : informally, C contains all 
the roots of every polynomial in C[x ]. We now return to the study of ideals and 
homomorphisms in order to prove a “local” analog of the Fundamental Theorem 
of Algebra for polynomials over an arbitrary field k: given a polynomial fix)e 
k[x], then there is some field K containing k that also contains all the roots of 
fix) (we call this a local analog, for even though fix) splits in AT [v ] . that is, 
fix) is a product of linear polynomials in K [ x | . other polynomials in k[x\ may 
not split in K[x\). The main idea behind the construction of K involves quotient 
rings, a straightforward generalization of the construction of I m which we now 
review. 

Given Z and an integer m, the congruence relation on Z is defined by: 

a = b mod m if and only if m \ (a — b). 

This definition can be rewritten: a = b mod m if and only if a — b = km for 
some k e Z, and this is equivalent to a — b e (in), where (in ) denotes the 
principal ideal in Z generated by m . Congruence is an equivalence relation on 
Z, its equivalence classes [a \ are called congruence classes, and the set I,„ is the 
family of all the congruence classes. 

We begin the new construction. Given a commutative ring R and an ideal 7, 
define the relation congruence mod / on 77: 

a = b mod 7 if and only if a — b e I. 
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Lemma 3.105. If R is a commutative ring and I is an ideal in R, then congru- 
ence mod I is an equivalence relation on R. 

Proof. 

(i) Reflexivity: if a e R. then a — a = 0 e I; hence, a = a mod I. 

(ii) Symmetry, if a = b mod 7, then a — b e 7. Since —1 e R, we have 
b — a = (— l)(o — b) e I, and so b = a mod 7. 

(iii) Transitivity, if a = b mod 7 and b = c mod 7, then a—bel and b — c € 7. 

Hence, a — c = (a — b) + (b — c) e 7, and a = c mod 7. • 

Definition. If R is a commutative ring and 7 is an ideal in R, then the equiva- 
lence class of a e R, namely, 

[a] = [b e R : b = a mod 7} 

is called the congruence class of a mod 7. 

The set of all the congruence classes mod 7 is denoted by R/l . 

Addition and multiplication are defined on I,„ by the formulas: 

[o] + [ b ] = [a + b] and [a][b] = [ab]. 

It is not obvious that these functions I m x I m — » I m are well-defined, and we 
were obliged to prove that they are (see Propositions 2.101 and 2.103). These 
formulas also give addition and multiplication operations on R/I. 

Lemma 3.106. The functions 

a : (R/I) x (R/I) -* R/I, given by ([a], [ b ]) i-> [a + b], 

and 

p: (R/I) x (R/I) -> R/I, given by ([a], [ b ]) i-> [ab], 
are well-defined operations on R/I. 

Proof. Let us prove that a and p. are well-defined. Recall Lemma 2.19: if = is 
an equivalence relation on a set X, then [a] = [cd] if and only if a = here, 
[a] = [a'] if and only if a — a' e 7. Is addition well-defined? If [a] = [a'] 
and [b] = [If], is [a + b] = [a' + b'\, that is, if a — a' e 7 and b — b 1 e I , 
is (a + b) — (a' + b’) e 7? The answer is “yes,” for (a + b) — ( a ' + b') = 
(a — a') + (b — b') e 7. Hence, [a + b] = [ a ’ + b']. Is multiplication well- 
defined; that is, if [a] = [o'] and [b] = [7/], is [ab] = [a’b’]l Now a — a' e I 
and b — b' e I, and so 

ab — a'b' = (ab — ab’) + ( ab ' — a'b') = a(b — b') + (a — a')b' e 7, 

for ideals are closed under products ri for r e R and i e I. Therefore, [ab \ = 
[a'b']. • 
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The proof that R/I, equipped with the operations in Lemma 3. 106, is a com- 
mutative ring is, mutatis mutandis, precisely the proof that I m is a commutative 
ring. In essence, the ring axioms hold in R/I because they are inherited from 
the ring axioms in R. 

Theorem 3.107. If I is an ideal in a commutative ring R, then R/I, equipped 
with the addition and multiplication defined in Lemma 3.106, is a commutative 
ring. 

Proof We check each axiom of the definition of commutative ring on page 216. 

(i) Since a + b = b + a in R, we have 

[a] + [ h ] = [a + b] = [b + a] = [ h ] + [a]. 

(ii) • Now [a] + ([fi] + [c]) = [a] + [b+c\ = [a+{b+c)], while ([a] + [fi]) + [c] = 
[a+b]+[c\ = [(a+fi)+c], and the result follows because a + (b+c) = (a+b)+c 
in R. 

(iii) . Define 0 = [0], where the 0 in brackets is the zero element of R. Now 
0 + [a] = [0 + a] = [a], because 0 + a = a in R. 

(iv) Define [a]' = [—a]. Now [—a] + [ a ] = [—a + a] = [0] = 0. 

(v) Since ab = ba in R, we have [a][b] = [ ab ] = [ba] = [b][a]. 

(vi) Now [a]([fi][c]) = [ a][bc ] = [a(bc)], while ([fl][fi])[c] = [ab][c] = 
[(afi)c], and the result follows because a (be) = ( ab)c in R. 

(vii) Define 1 = [1], where the 1 in brackets is the one in R. Now 1 [a \ = [la] = 
[a], because la = a in R. 

(viii) We use the distributivity in R: [«]([/?] + [c]) = \a\\b + c] = [ a (h + c)] 
and [a][fi] + [a][c] = [ab] + [ac] = [ ac + ab] = [ a(b + c)]. • 


Definition. The commutative ring R/I just constructed is called the quotient 
ring of R modulo I (pronounced R mod /). 

Congruence classes in I,„ have another description, as cosets: 

[a] = [b e Z : b = a + km for k e Z} = a + (m). 


Definition. If R is a commutative ring and I is an ideal, then a coset is a subset 
of R of the form 


a + I = {beR:b = a + i for some i e I). 


We now show that cosets are the same as congruence classes. 
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Lemma 3.108. If R is a commutative ring and I is an ideal, then for each 
a £ R, the congruence class [a] in R/I is a coset: 

[a] = a + I. 

Proof If be [a], then b — a £ I. Hence, b = a + (b — a) £ a + I, and so 
[a | c a + I. For the reverse inclusion, if c £ a + 7, then c = a + i for some 
i £ 7, and so c — a £ I. Hence, c = a mod I, c £ [a], and a + I c [a]. 
Therefore, [a] = a + I. • 

The coset notation a + I is the notation most commonly used; thus, 

R/I = [a + I : a £ R}. 

If we forget the multiplication in a commutative ring R, then it is an additive 
abelian group, and every ideal I in R is a subgroup. Since every subgroup of 
an abelian group is a normal subgroup, the quotient group R/I is defined. We 
claim that this quotient group coincides with the additive group of the quotient 
ring R/I. The elements of either quotient are the same (they are cosets of I), 
and the addition of cosets in each is the same as well. In particular, the principal 
ideal (m) in Z is often denoted by ml and we have been denoting the quotient 
ring Z/mZby I m . 

The natural map n : Z -> I m , defined by n(a) = [ a ] = a + ( m ), is a ring 
homomorphism, and so is its generalization to quotient rings. 

Definition. If R is a commutative ring and 7 is an ideal, then the natural map 
it : R R/I is defined by it {a) = a + I for all a £ R. 

We can now prove a converse to Proposition 3.38. 

Proposition 3.109. 

(i) Let I be an ideal in a commutative ring R. The natural map it : R — > 77/7 
is a surjective homomorphism whose kernel is I. 

(ii) A subset J of R is an ideal if and only if J is the kernel of a homomorphism 
from R to some commutative ring. 

Proof. 

(i) We have it (\ ) = 1+7, which is the one in 77/7. The definition of addition in 
R/I gives 

7 t(a) + jt(b) = (a + 7) + (b + 7) = a + b + I = it{a + b ), 
and the definition of multiplication gives 

it{a)it{b) = (a + I)(b + 7) = ab + 7 = it(ab). 
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Therefore, n: R — »■ R/I is a homomorphism. 

The elements of R/I are cosets a + 7; since a + 1 = it (a), the natural map 
is surjective. Clearly, 7 C kern - , for if a el, then re (a) =a + I = 0 + 1. 
For the reverse inclusion, if a e ker7r, then it (a) = a + I = I, and so a e I. 
Therefore, ker it = I. 

(ii) If there is a commutative ring A and a homomorphism (p: R — > A with 
J = ker <p, then Proposition 3.38 shows that J is an ideal. Conversely, if J is an 
ideal in R, then the natural map it : R — »■ R/ J i s a homomorphism whose kernel 
is J . • 

Proposition 2.99 describes the congruence classes \a \ in I m in a very simple 
way; they are all possible remainders after dividing by m : 

I*= {[0], [1], [2] [m — 1]}. 

In general, there is no such easy description of the elements of R/I . On the other 
hand, we shall soon see (Theorem 3.115) that there is a version of this description 
when R = k[x] (when k is a field) and I = (f (x)) for f(x) e k[x]. 

For readers familiar with groups, the proof of Theorem 3.107 can be short- 
ened a bit. If we forget the multiplication in a commutative ring R, then an 
ideal 7 is a subgroup of the additive group 77; since R is an abelian group, the 
subgroup 7 is necessarily normal, and so the quotient group R/I is defined. 
Thus, axioms (i) through (iv) in the definition of commutative ring hold. One 
now defines multiplication, shows that it is well-defined, and verifies axioms (v) 
through (viii). Presumably, quotient rings are so-called in analogy with quotient 
groups. The proof of Proposition 3.109 can also be shortened a bit. The nat- 
ural map jt : R — y R / 1 is a homomorphism between the additive groups of R 
and of R/I, and one need only check that tt( 1) =1 + 7 and that it preserves 
multiplication. 

Theorem 3.110 (First Isomorphism Theorem). If (p : R — »■ S is a homo- 
morphism of commutative rings, then ker <p is an ideal in R, im cp is a subring 
of S, and there is an isomorphism 

<p: R/ ker <p — > im cp 

defined by/p \ a + keryj cp(a). 

Proof Let 7 = ker<^. We have already seen, in Proposition 3.38, that 7 is an 
ideal in R and that im tp is a subring of A. 

<p is well-defined. 

If a + I = b + 7, then a — b e I = ker <p, so that (p(a — b) = 0. But 
<p(a — b) = tp(a) — (p{b)\ hence, <p(a + 7) = < p(a ) = <p(b) = <p(b + I). 

<p is a homomorphism. 
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First, (p(\ + 7) = <p( 1) = 1. 
Second, 


(p ((a + 7) + (7> + 7)) — (n + b + /) 

= <p(a + b) 

= <j»(a) + (p(fi) 

= + 7) + (p(b + 7). 

Third, 

ip ((a + 7)(fi + /)) = ip(ab + 7) 

= (p(ab) 

= (p(a)<p(b) 

= <p(a + 7)^(fi + I). 

(p is surjective. If a e im cp. then x = (p(a) for some a e R, and so x = 

( p(a + I). 

(p is injective. If a + I e kerip, then <p(a + /) = 0. But cp(a + I) = (p(a). 
Hence, cpia ) = 0, a £ ker cp = 7, and a + / = 7=0+7. Therefore, 
ker (p = {0 + 7}, and (p is an injection. • 

The proof of the first isomorphism theorem for rings can be shortened for 
readers familiar with group theory. If we forget the multiplication, then the proof 
of Theorem 2.1 14 shows that the function ip : R/I — > S, given by (p{r + 7) = 
(p{r), is a (well-defined) isomorphism of the additive groups. Since <p( 1 + 7) = 
(p{ 1) = 1, it is only necessary to prove that ip preserves multiplication. 

The first isomorphism theorem says that there is no significant difference 
between a quotient ring R / ker (p and the image of a homomorphism (p, for they 
are isomorphic rings. Another viewpoint is that one can create an isomorphism 
from a homomorphism once one knows its kernel and image. Given a homo- 
morphism, the first questions one should ask are what is its kernel and what is 
its image. (There are analogs for commutative rings of the second and third iso- 
morphism theorems for groups, but they are less useful for rings than they are 
for groups.) 

Recall that the prime field of a field k is the intersection of all the subfields 
of k. 

Proposition 3.111. Ifk is a field, then its prime field is isomorphic to Q or to 
F p for some prime p. 

Proof. Denote the one in k by s, and consider the homomorphism / : Z — > k 
defined by /(«) = ns. Since every ideal in Z is principal, there is an integer 
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m > 0 with ker y = (in). If m = 0, then y is an injection, and so im y is a 
subring of k isomorphic to Z. By Exercise 3.50 on page 249, Q = Frac(Z) = 
Frac(im y ). Since Q is the smallest field containing Z as a subring, it follows 
from Exercise 3.24 on page 232 that the prime field of k is isomorphic to Q in this 
case. If m ^ 0, the first isomorphism theorem gives F„, = Z /(m) = im y Q k. 
Since k is a field, im y is a domain, and so Proposition 3.12 gives m prime. If 
we now write p instead of m, then im / = {0, e, 2s, . . . , (p — l)e} is a subfield 
of k isomorphic to F f) . Thus, Exercise 3.24 shows that im y = F /; is the prime 
field of k in this case. • 

This last result is the first step in classifying different types of fields. 

Definition. A field k has characteristic 0 if its prime field is isomorphic to Q; 
if its prime field is isomorphic to F p for some prime p, then one says that k has 
characteristic p. 

The fields Q, M, C, C(x) have characteristic 0, as does any subfield of the 
latter three. Every finite field has characteristic p for some prime p, as does 
F p (x), the ring of all rational functions over F p . 

Recall that if R is a commutative ring, r e R, and n e IT, then nr = 
r + ■ ■ ■ + r , where there are n summands. If s is the one in R, then nr = ( ns)r . 
Thus, if ns = 0 in R, then nr = 0 for all r e R. Now, in F p , we have p[ 1] = 
[p] = [0], and so p[r \ = [0] for all [r] e ¥ p . More generally, if k is any field of 
characteristic p > 0, then pa = 0 for all a e k. In particular, when p = 2, we 
have 0 = 2 a = a + a, and so, in this case, —a = a for all a e k. 

Example 3.112. 

Consider the homomorphism (p : M[x ] -»• C, defined by /(x) m* /(;'), where 
i~ = —1; that is, cp: a/cX k i->- ^ k aki k ■ The first isomorphism theorem 

teaches us to seek im <p and ker ip. 

First, <p is surjective: if a+bi e C, then a+bi = (p(a+bx) e im^5. Second, 
ker (p = [f(x) e M[x] : /O') = 0}, 

the set of all polynomials having i as a root. Of course, x 2 + 1 e kcvip, and we 
claim that ker^ = (x 2 + 1). Since M[x] is a PID, the ideal ker ip is generated 
by the monic polynomial of least degree in it. If x 2 + 1 does not generate ker<^, 
then there would be a linear divisor of x 2 + 1 in JR[x]; that is, x 2 + 1 would have 
a real root. The first isomorphism theorem now gives R[x]/(x 2 + 1) = C. 

Thus, the quotient ring construction builds the complex numbers from the 
reals; that is, if one did not know the field of complex numbers, it could be 
defined as R[x]/(x 2 + 1). One advantage of constructing C in this way is that it 
is not necessary to check all the field axioms, for Theorem 3.113 shows that they 
hold automatically. A 
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Theorem 3.113. Ifk is afield and I = ( p(x )), where p(x) £ k [x \ is noncon- 
stant, then the following statements are equivalent. 

(i) k[x]/I is afield. 

(ii) k[x\/ 1 is a domain. 

(iii) p (x ) is irreducible in k\x \. 

Proof. 

(i) =>• (ii). Every field is a domain. 

(ii) =>• (iii). If p(x) is not irreducible, then there is a factorization p(x) = 
g(x)h(x) in k[x | with deg(g) < deg (p) and dcg(/z) < deg (p). If g(x) e I = 
(p), then p(x) | g(x) and dcgt/Q < deg(g), a contradiction; thus, g (x ) + I f 

0 + 1 in k[x\/I. Similarly, h(x) + I ^ 0 in k[x\/I. However, the product 

(g(x) + I)(h(x) +I) = p(x) + 1=0 + 1 

is zero in the quotient ring, contradicting k[x\/I being a domain. Therefore, 
p(x) must be an irreducible polynomial. 

(iii) =>• (i). Assume that p(x) is irreducible. Since p(x) is not a unit, the ideal 
/ = (p(x)) does not contain 1 ; that is, 1 +/ 7 ^ 0 in k[x]/I. If f(x) + I £ k[x\/I 
is nonzero, then f(x) /, that is, fix) is not a multiple of p (x ) or, to say it 
another way, p \ f . By Lemma 3.67, p and / are relatively prime, and so 
there are polynomials s and t with sf + tp = 1. Thus, sf — 1 e /, and so 

1 +I=sf + I = (s + I)[f + I). Therefore, every nonzero element of k[x\/I 
has an inverse, and so k[x\/ 1 is a field. • 

Compare this theorem with Proposition 3.19, which can be rephrased as 
giving the equivalence of the statements: I,„ is a field; I m is a domain; m is a 
prime. 

Proposition 3.114. 

(i) Ifk is afield, let p(x) £ k[x ] be an irreducible polynomial, and let I = 
( p(x )). Then k[x]/ ipfx)) = k[x\/I is afield containing (an isomorphic 
copy of) k and a root z. of p(x). 

(ii) If g(x) £ k[x ] and z. is also a root of g(x), then p(x) \ g(x). 

Proof. 

(i) The quotient ring K = k[x\/I is a field, by Theorem 3.1 13, because p (x ) is 
irreducible. Define cp: k — > K by (p(a) = a + /; (p is a homomorphism because 
it is the restriction to k of the natural map k[x ] — >■ k[x]/I. By Corollary 3.45, 
<p is an injection (because k is a field), and so it is an isomorphism from k to the 
sub field k' = {a + I : a £ k] C K . Let us identify k with this sub field lc of K . 
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Remember that x is a particular element of k[x\: we claim that z = x + / e K 
is a root of p(x). Now 


p(x) = «o + a\x + • • • + a n x n , 

where a\ e k for all i. In k[x\/I, we have 

piz) = (no + 7 ) + (fli + I)z + ■ • • + ia n + I)z n 

= (a o + I) + (fli + 7)(x + /) + ••• + (a n + I){x + /)” 

= (flo + 7) + {a\x + /) + ••• + ( a n x n + 7) 

= r?o a\x -b * * * -\- a n x n -\- 1 
= p(x) + 1=1, 

because p(x) e I = (p(x)). But I is the zero element of k[x]/I, and so z is a 
root of p(x). 

(ii) Sincezisarootof^(x),wehave^(x) e ker7r, where 7T : k[x] -*■ k[x]/(p(x)) 
is the natural map, and so p(x) \ g(x). • 

Here is a compact description of k[x]/(f(x)) that is similar to Corollary 1.56, 
the description of I m as {[0], [1], ... , [m — 1]}. Although the next theorem is true 
for any, not necessarily irreducible, polynomial fix), the most important case is 
when fix) is irreducible, for then £[x]/(/(x)) is a field. 

Theorem 3.115. Let k be a field, let fix) e k[x\ be a nonzero polynomial of 
degree n > 1, let 1 = (/(x)), and let K = k[x\/I. Then every element in K has 
a un ique expression of the form 

bo + b\z + • • • + b n -\z n ' , 

where z = x + I is a root of fix) and all bj e k. 

Proof Every element of K has the form g (x ) + 7, where g(x) € /c[x |. By 
the division algorithm, there are polynomials q)x),r)x) e k[x ] with g(x) = 
qix)fix) + r(x) and either r (x ) = 0 or deg(r) < n = deg(/). Since g — r = 
qf € /, it follows that gix) + I = r (x ) + 1 . As in the proof of Proposition 3.1 14, 
we may rewrite r(x) + / as r(z) = + b\z + • • • + b„-\z n ~ [ with all b\ e k. 

To prove uniqueness, suppose that 

riz) = b 0 + biz-\ b = c 0 + ciz H b 

where all q e k. Define li(x) e k[x \ by h (x ) = (bj — c,)x ( . Since z is 

a root of hix). we have /z(x) e (/(x)); that is, fix) \ /z(x). If hix) is not the 
zero polynomial, then deg(/z) > n = deg(/). But dcg(/;) < n, a contradiction. 
Therefore, h (x) = 0 and bj = c t for all i. • 
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Applying this theorem to Example 3.112, in which f(x) = x 2 + 1 £ R[x |. 
n = 2, and the coset x + 1 [where 1 = (x 2 + 1 ) ] is denoted by i , we see that every 
complex number has a unique expression of the form a + bi, where a, b £ R, 
and that i 2 + 1 = 0, that is, i 2 = — 1 . 

The easiest way to multiply in C is to first treat i as a variable and then to 
impose the condition i 2 = —1. For example, to compute (a + bi ) (c + di ). first 
write ac+ (ad + bc)i + bdi 2 , and then observe that i 2 = —1. The proper way to 

multiply (b 0 + biz-\ h b n - 1 z n ~ 1 ) (c 0 + ci z H h c„_ i z n ~ 1 ) in the quotient 

ring k[x]/(p(x)) is to first regard the factors as polynomials in z and then to 
impose the condition that p(z) = 0. These remarks follow from the natural map 
7 r : f(x) f(x) + I being a homomorphism. Since jt : f(x) f(z), we see 
that 7t{f)n(g) is the product f (z,)g(z,). On the other hand, Jt( fg) first multiplies 
f(x)g(x) and then sets x = z. 

We are now going to generalize Example 3.112. 

Definition. Let K be a field and let k be a subfield. If z e K , then we define 
k{z) to be the smallest subfield of K containing k and z; that is, k(z) is the 
intersection of all the subfields of K containing k and z. One calls k(z) the field 
obtained from k by adjoining z. 

For example, C = K(i); the complex numbers are obtained from R by ad- 
joining i. In Theorem 3.115, we have K = k(z), where z = x + I. 

The next result shows that quotient rings occur in nature. 

Proposition 3.116. Let k be a subfield of a field K and let z <Z K . 

(i) If z is a root of some nonzero polynomial f(x ) € k[x\, then z is a root of 
an irreducible polynomial p(x) £ k[x |, and p(x) \ fix). 

(ii) For p(x) in part (i), there is an isomorphism 

cp: k[x]/(p(x)) k(z) 

with <p(x + (p(x)) = z and tp(a) = afar all a £ k. 

(iii) If z and z! are roots of p(x) lying in K, then there is an isomorphism 
6 : k(z) — »■ k(z') with 9(z) = z! and with 6(a) = afar all a ek. 

(iv) Each element in k(z) has a unique expression of the form 

bo + b\z + ■ ■ ■ + b n —\z n \ 
where bj £ k and n = deg(p). 

Proof Our proof is essentially the same as that of Proposition 3.111. 
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(i) Define a homomorphism <p : k[x | — »■ K by <p(x) = z\ in more detail, 

(p: Y' bjx' Y.bjZ 1 ■ 

Notice that <p(a) = a for all a e k. Now /(x) e ker (p because z is a root of 
/(x), and so ker <p is a nonzero ideal in k[x |: indeed, ker <p must have the form 
(p(x)) for some nonzero p(x) e k [x | because k[x\ is a PID. Since i m <p is a 
subring of the held K, it is a domain, and so Theorem 3.113 says that p (x ) is 
irreducible. 

We may regard both p(x ) and /(x) as lying in k[x \. and each is a multiple of 
x — z in K [x]; it follows that their gcd(/, p) in K [x] is not 1. By Corollary 3.75, 
however, this gcd is the same whether computed in K [x] or in k [x ] . But p(x) is 
irreducible, so that (/, p) / 1 in k [x | gives p \ f in k[x ] (by Lemma 3.67). 

(ii) Since p (x ) is irreducible, Theorem 3.113 shows that im^ is a held; that is, 
im (p is a subheld of K containing k and z, and so k(z) Q im <p. On the other 
hand, every element in im <p has the form g(x)+I, where g(x) e fe[x], so that any 
subheld S of K containing k and z must also contain im tp\ that is, im <p = k(z). 

(iii) As in part (ii), there is an isomorphism i// : k[x |/ (p(x)) — >• k(z') with 
i Js(a) = a for all a e k and ijr(x + (p(x)) = z- The composite 6 = \j/ o cp~ l is 
the desired isomorphism. 

(iv) By Theorem 3.1 15, each element in k[x]/I, where I = (p(x))), has a unique 
expression of the form 7>o+hi(x + /) + - • •+h„_i(x + /)” -1 , and injectivity of the 
isomorphism k[x\/I — >• k(z) which sends x + / i-^ z preserves this uniqueness. 


Corollary 3.117. Let k be afield and p{x) € k[x] be an irreducible polynomial. 
If K = k[x]/I, where 1 = ( p(x)),andifa e K, then there exists a unique monic 
irreducible polynomial h(x) € k[x | having a as a root. 

Proof. By Theorem 3.1 15, a = ho + b\z + • • • + h„_ iz” _1 , where z = x + 7, 
all bj e k, and n = deg (p). Thus, a is a root of bo + b\x + • • • + h„_ix” -1 e 
h[x], and Proposition 3. 1 16(i) applies: there is a monic irreducible polynomial 
h(x) e k[x | having a as a root. 

To prove uniqueness of h(x), suppose that g(x) e k[x \ is another monic 
irreducible polynomial having a as a root. In K[x \, the gcd (/; , g ) f I (for 
x — a is a common divisor), and so Corollary 3.75 says that (h, g) ^ 1 in 
k[x]. Since Ii (x) is irreducible, its only monic divisors are 1 and itself, and so 
(h,g) = h. Therefore, h (x) | g(x). But since g(x) is monic irreducible, we 
must have h(x ) = g(x). • 

We now prove two important results: the first, due to Kronecker, says that 
if f(x) e k [x | . where k is any held, then there is some larger held E which 
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contains k and all the roots of /(x); the second, due to Galois, constructs finite 
fields other than F ; , . 

Theorem 3.118 (Kronecker). Ifk is afield and /(x) e k[x\ is nonconstant, 
then there exists a field K, containing k as a subfield, with /(x) splitting in 
K\x |; that is, /(x) is a product of linear polynomials in K[x\. 


Remark. One often says that /(x) splits over K if it is a product of linear 
polynomials in K[x]. ◄ 

Proof. We prove the theorem by induction on deg if), and we modify the state- 
ment a bit to enable us to prove the inductive step more easily: if E is any field 
containing k as a subfield (so that fix) e k [x | c E [x ] ) , then there is a field 
K containing E such that /(x) is a product of linear polynomials in K\x). If 
deg if) = 1, then fix) is linear and we can choose K = E. For the inductive 
step, we consider two cases. If fix) is not irreducible, then fix) = gix)hix) 
in k[x), where deg(g) < deg (/) and dcg(/z) < deg if). By induction, there is 
a field E containing k with g(x) splitting over E; a second use of the inductive 
hypothesis provides a field K containing E with h(x) splitting over K. Thus, 
f(x) = gix)hix) splits over K. If /fix) is irreducible in k[x], then Propo- 
sition 3.1 14(i) provides a field E containing k and a root z of p(x). Hence 
pix) = (x — z)f(x) in E[x]. By induction, there is a field K containing E so 
that l (x), and hence fix) = (x — z)t (x) splits over K . • 

For the familiar fields Q, M, and C, Kronecker’ s theorem offers nothing new. 
The Fundamental Theorem of Algebra, first proved by Gauss in 1799 (com- 
pleting earlier attempts of Euler and of Lagrange), says that every nonconstant 
fix) e C[x] has a root in C. It follows, by induction on the degree of fix), that 
all the roots of fix) lie in C; that is, fix) = a(x — rf) . . . (x — r„), where a e C 
and rj e C for all j . On the other hand, if k = F ;) or k = C(x) = Frac(C[x]), 
then Kronecker’s theorem does apply to tell us, for any given fix), that there 
is always some larger field E that contains all the roots of fix). For example, 
there is some field containing C(x) and s fx. There is a general version of the 
fundamental theorem: every field k is a subfield of an algebraically closed field 
K, that is, K is a field containing k such that every fix) e k[x ] is a product of 
linear polynomials in K[x). In contrast, Kronecker’s theorem gives roots of just 
one polynomial at a time. 

We now consider finite fields; that is, fields having only finitely many ele- 
ments. Our first goal is to prove Proposition 3.119: every finite field has exactly 
p n elements for some prime p and some n > 1 . Our first proof of this result uses 
group theory (also see Exercise 3.102 on page 305). 
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Proposition 3.119. If E is a finite field, then |£| = p" for some prime p and 
some n > 1. 

Proof If k is the prime field of E, then Proposition 3.1 1 1 says that k = Q or 
k = W p for some prime p; since Q is infinite, we have k of characteristic p. 
Therefore, pa = 0 for all a e E\ that is, as an additive abelian group, every 
nonzero element in E has order p. If there is a prime divisor q of E with 
q f p, then Cauchy’s theorem (Theorem 2.145) gives a nonzero element b e E 
with qb = 0, contradicting every nonzero element having order p. We conclude 
that \E\ = p" for some n > 1. • 

There is an elegant proof of Proposition 3.1 19 using linear algebra, Propo- 
sition 4.29, and many people view linear algebra as a part of commutative ring 
theory. However, for those readers who have not yet learned groups or linear 
algebra, here is an awkward proof using only commutative rings. 

Lemma 3.120. Let E be a finite field, and let k be its prime field. 

(i) There is a prime p with k = ¥ p . 

(ii) There is an integer M > 0 so that every nonzero z € E is a root ofx M — 1. 

(iii) If S is a subfield of E and z £ E, then |S(z)| = \S\ m for some m > 1. 
Proof 

(i) Since E is finite, it cannot contain a copy of Q, and so Proposition 3.1 1 1 says 
that its prime field k = F ;) for some prime p. 

(ii) Let z £ E be nonzero. Since E is finite, there must be repetitions on the list 
1 , z, z 2 , ■ ■ - Let the first repetition occur at m; that is, 1, z, z 2 , ■ ■ ■ , z" l ~ l are all 
distinct, while z! n £ {1, z, z 2 , . . . , z" 1-1 }. If z m 1, then z m = z l for some 
i < m, and so z m ~' = 1. But m — i < m — 1, contradicting 1, z, z 2 , ■ ■ ■ , z m ~ l all 
being distinct. Therefore, z m = 1. For each nonzero j e £, we have found an 
integers = m(z ) with z m<z> = 1, Since there are only finitely many elements in 
E, we may define M = n zeE x m{z) (where E x denotes the nonzero elements 
of E), and z. M = 1 for all z e E x . Hence, every nonzero z £ E is a root of 
x M — 1. 

(iii) We claim that z is a root of an irreducible polynomial q{x) e k[x ]. If z = 0, 
take q (x ) = x\ if z f 0. then part (ii) shows that z is a root of a nonzero 
polynomial in 5[x], namely, x M — l, and Proposition 3.1 16 gives an irreducible 
polynomial q (x) £ S [x \ having z as a root. Let X = S x • • • x S be the cartesian 
product of d copies of S with itself, where d = deg (q). By Proposition 3.1 16, 
the function ft : S(z) X, defined by 

f : bo + b\z + • • • + bd~\z d 1 m* (£>o> bu • • ■ < bj— i) 

is abijection. Hence, |5(z)| = |Z| = \S\ d . • 
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Proposition 3.121. If E is a finite field, then |£| = p n for some prime p and 
some n > 1. 

Proof Since E is finite, Lemma 3.120(i) says that its prime field k = ¥ p for 
some prime p. If k = E, we are done. Otherwise, there is some element zi £ E 
with zi ^ k. By Corollary 3.120(iii), |£(zi)| = p m . If k(z\) = E, we are done. 
Otherwise, there is some element zi £ E with zi £ k(z\). If K = k(z, \ ), then 
Corollary 3.120(iii) gives \ K (zill = (p m ) n = P mn ■ If K{zf) = E, we are done. 
This procedure must end after a finite number of steps because there are only 
finitely many elements in E. • 

The next group-theoretic result will imply that every finite field E contains 
a primitive element tv, that is, there is an element it e E with every nonzero 
a e E equal to some power of n. 

Theorem 3.122. 

(i) If k is a field and G is a finite subgroup of the multiplicative group k x , 
then G is cyclic. In particular, ifk itself is finite (e.g., k = W p ), then k x is 
cyclic. 

(ii) For each positive integer m, there exists a primitive mth root of unity 
Z £ k; that is, every mth root of unity in k is a power of z. 

Proof. 

(i) Let d be a divisor of |G|. If there are two subgroups of G of order d, say, S 
and T, then |S U T\ > d. But each a e S U T satisfies a d = 1, and hence it is a 
root of x d — 1. This contradicts Theorem 3.50(i), for x d — \ now has too many 
roots in k. Thus, G is cyclic, by Proposition 2.73. 

(ii) The set T m = {all mth roots of unity in k] is a (finite) subgroup of k x , and 
so it is cyclic. A generator of T m is a primitive mth root of unity. • 

Although the multiplicative groups F* are cyclic, no explicit formula giving 
generators of each of them is known; i.e., no efficient algorithm is known that 
computes s(p) for every prime p, where [ ,v ( /j ) ] is a generator of F* . 

This result gives another proof of Proposition 3.121 and a little more. It 
proves that there is some n e E so that E =W p (n). 

Corollary 3.123 (= Corollary 3.121). Every finite field E has exactly p n ele- 
ments for some prime p and some n > 1. 

Proof. Since E is finite, its prime field k = F ; , for some prime p. There is 
a primitive element n e E, by Theorem 3.122, and so \k(jt)\ = p n for some 
n > 1, by Lemma 3.120(iii). But k(n) = E, because the powers of it already 
give all of E x . • 
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We are now going to display the finite fields. We saw, in Lemma 3.120(h), 
that every nonzero element of £ is a root of a m — 1 for some integer M. It follows 
that every element of £, including 0, is a root of x M+1 — x. A consequence of 
the next proof is that M + 1 can be chosen to be a power of p. 


Theorem 3.124 (Galois). If p is a prime and n is a positive integer, then there 
exists afield that has exactly p n elements. 

Proof Write q = p n , and consider the polynomial 

g(x) = x q — x e Fp [x |. 

By Kronecker’s theorem, there is a field E containing F /; such that g(x) is a 
product of linear factors in £[x ]. Define 

F = [a e E : g(a) = 0}; 

thus, F is the set of all the roots of g(x). Since the derivative g'(x) = qx q ~ { — 1 
= p n x q ~ l — 1 = —1, it follows that the gcd (g, g') = 1. By Exercise 3.63 
on page 271, all the roots of g(x) are distinct; that is, F has exactly q = p n 
elements. 

We claim that F is a subfield of E, and this will complete the proof. If 
a, b e F, then a q = a and b q = b. Therefore, ( ab) q = a q b q = ab, and 
ab e F. By Exercise 3.47 on page 249(iii), (a — b) q = a q — b q = a — b, so that 
a — b e F. Finally, if a 0, then the cancellation law applied to a q = a gives 
a q ~ l = 1, and so the inverse of a is a q ~ 2 (which lies in F because F is closed 
under multiplication). • 

In Corollary 5.25, we will see that any two finite fields k with the same 
number of elements are isomorphic. Here are some special cases of this theorem. 


Example 3.125. 

In Exercise 3.17 on page 232, we constructed a field k with 4 elements as all the 
matrices = a [o ?] + ^ [ ? }]> where a,b e 1 2 . (We remark that [ j J ] 

is a primitive element, as is [ } ( *, ] ) 

On the other hand, we may construct a field with 4 elements as the quotient 
F = F 2 \x\/(x 2 +x+ 1 ), for x 2 +x+ I e F 2 [a | is irreducible. By Corollary 3.1 15, 
£ is a field consisting of all a + bz, where z is a root of x"+x + 1 and a, be F 2 . 
Since z 2 + z + 1 = 0, we have z 2 = — z— 1 = z + 1; moreover, z 3 = zz 2 = 
z(z + 1) = z 2 + z = 1. It is now easy to see that there is a ring isomorphism 
(p: k ^ F with <p : [g a ^ b ] a + bz. ◄ 
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Example 3.126. 

According to the table in Example 3.100, there are three monic irreducible quad- 
ratics in F 3 [x\, namely, 

p(x) = x 2 + 1 , q(x) = x 2 + x — 1 , and r(x) = x 2 — x — 1 ; 

each gives rise to a field with 9 = 3 2 elements. Let us look at the first two in 
more detail. Corollary 3.1 15 says that E = F 3 \x\/(q(x)) is given by 

E = [a + ba : where a 2 + 1 = 0}. 

Similarly, if F = F 3 [a | / ( p(x)). then 

F = {a + b{5 : where ft 2 + /J — 1 = 0}. 

The reader can show that these two fields are isomorphic by checking that the 
function < p: E ^ F, defined by 

cp(a + ba) = a + b(l — /3) , 


is an isomorphism. 

Now F-i[x]/{x 2 — x — T) is also a field with 9 elements, and one can show 
that it is isomorphic to both of the two fields E and F given above. 

In Example 3.100, we exhibited 8 monic irreducible cubics p(x) e F 3 [xj; 
each of them gives rise to a field F 3 [x]/(p(x)) having 27 = 3 3 elements. All 
eight of these fields are isomorphic. 4 


Exercises 

3.87 For every commutative ring R , prove that R\x\/(x) = R. 

*3.88 Let k be a fi eld and f(x), g(x) e k\x | be relatively prime. If each divides h(x) in 
k[x\, prove that their product f(x)g(x) also divides h(x). 

3.89 Defi ne cp : Z —*■ T, n x Z„ by cp : a (->• ([a] m , [«]„), where [«]„, is the congmence 
class of a in Z,„. 

(i) Prove that cp is a homomorphism, and that ker <p = (, m ) f! (n). 

(ii) If (m, n) = 1, prove that (m) fl («) = (inn). 

(iii) If (m, n) = 1, prove that is surjective. 

(iv) Use part (iii) to prove the Chinese remainder theorem. 

3.90 Chinese Remainder Theorem in k\x\. 

(i) Prove that if k is a fi eld and f(x), f(x) e k\x\ are relatively prime, then 
given b(x), b'(x) G k)x), there exists c(x) G k[x\ with 

c — b G (/) and c — b’ G (/'); 


moreover, if d(x) is another common solution, then c — d G iff')- 
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(ii) Prove that if k is a ft eld and /(x), g(x) G /:|a' | are relatively prime, then 
k[x]/(f(x)g(x)) = k[x]/(f(x)) x k\x\/(g(x)). 

*3.91 Generalize Exercise 3.77 on page 278 by proving that if k is a fi eld of character- 
istic 0 and if p(x) G k\x | is an irreducible polynomial, then p(x) has no repeated 
roots. 

3.92 (i) Prove that a fi eld K cannot have subfi elds k and k" with k' = Q and 

k" = F p for some prime p. 

(ii) Prove that a fi eld K cannot have subfi elds k and k" with k' = F /; and 
k" = F 9 , where /?/(?. 

3.93 Let k be a subfi eld of a fi eld K, and let p(x) G k\x | be irreducible. If z, i € K are 
roots of p(x), prove that there is an isomorphism 8 : k(z) — > k(z') with 8(z) = z! 
and 8(a) = a for all a g k. 

*3.94 Prove that every element z in a fi nite fi eld E is a sum of two squares. (If z = a is 
a square, then we may write z = a 2 + 0 2 .) 

*3.95 If p is a prime and p = 3 mod 4, prove that either a 2 = 2 mod p is solvable or 
a 2 = —2 mod p is solvable. 

*3.96 (i) Prove that x 4 + 1 factors in F 2 \x\. 

(ii) If x 4 + 1 = (x 2 + ax + b)(x 2 + cx + d) G F ; , | x | , where p is an odd 
prime, prove that c = —a and 

d + b — a 2 = 0 
a(d — b) = 0 
bd = 1. 

(iii) Prove that x 4 + 1 factors in F ;7 [x], where p is an odd prime, if any of the 
following congruences are solvable: 

b 2 = — 1 mod p, 
a 2 = ±2 mod p. 

(iv) Prove that x 4 + 1 factors in F ; , [x] for all primes p. 

*3.97 Generalize Proposition 3. 1 16(iii) as follows. Let (p: k — >• k' be an isomorphism 

of fields, let E/k and E /k' be extensions, let p(x) G k\x\ and p*(x) G k'[x\ 

be irreducible polynomials (as in Exercise 3.44 on page 248, if p(x) = J]a,x ! , 

then p*(x) = Y2 ( P( a i) (iii) (iv) * * * * * x ‘), and let z G E and z! G E' be roots of p(x), p*(x), 

respectively. Then there exists an isomorphism <p: k(z) k'(z') with ^(z) = z! 

and with kp extending <p. 


k(z.) -■■> k'(z') 



3.98 If F is a fi eld with four elements, prove that the stochastic group £(2, F) = A4. 
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*3.99 Let f(x) = ao + a\x + - ■ ■ + a„^ix n ~ 1 +x” e k[x\, where k is a fi eld, and suppose 
that /(x) = (x — n)(x —r 2 )...(x — r n ) € E\x |, where E is some fi eld containing 
k. Prove that 

a n ~ 1 = — Oh + r 2 4 b r„) and o 0 = (-1 )"r 1 r 2 

Conclude that the sum and the product of all the roots of f(x) lie in k. 

3.100 If E = F2 [x]/fp(x)), where p(x) = x 3 +x + 1, then £ is a fi eld with 8 elements. 
Show that a root n of p(x) is a primitive element in E by writing every nonzero 
element of £ as a power of n . 

3.101 (i) Prove, for all n > 1, that there is an irreducible polynomial of degree n 

in Q[x]. 

(ii) Prove, for all n > 1 and every prime p, that there is an irreducible poly- 
nomial of degree n in F ;7 [x]. 

3.102 This exercise gives yet another proof, using group theory, of Proposition 3.121. 

(i) Let £ be a fi nite fi eld, but consider it only as a group under addition. 
Show, for each pair x and y of nonzero elements in £, that there is an 
isomorphism (p: F F with y = ip(x). 

(ii) Prove that |£| = p n for some prime p and some n > 1. 

3.103 Prove that the matrices A = g] and B = [?q] generate a subgroup of 
SL(2, 15) of order 8 which is isomorphic to Q. 


3.9 Officers, Magic, Fertilizer, and Horizons 


Offi cers 

In 1782, L. Euler posed the following problem in an article he was writing 
about magic squares. Suppose there are 36 officers of 6 ranks and from 6 reg- 
iments. If the regiments are numbered 1 through 6 and the ranks are captain, 
major, lieutenant, . . . , then each officer has a double label: e.g., captain 3 or 
major 4. Euler asked whether there is a 6 x 6 formation of these officers so 
that each row and each column contains exactly one officer of each rank and one 
officer from each regiment. Thus, no row can have two captains in it, nor can 
any column; no row can have two officers from the same regiment, nor can any 
column. 

The problem is made clearer by the following definitions. 

Definition. An n x n Latin square is an n x n matrix whose entries are taken 
from a set X with n elements (e.g., X = {0, 1, ...,« — 1}) so that no element 
occurs twice in any row or in any column. 
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It is easy to see that an n x n matrix A (whose entries lie in a set X with 
|X| = n) is a Latin square if and only if every row and every column of A is a 
permutation of X. 

As is customary, we may denote a matrix A by A = [a ;j ], where a, j are its 
entries; the first index i of a\j describes the /th row 

fl ( 'l fl ( '2 ■ • • 

and the second index j describes the / th column 

fl i j 
a 2j 


a nj 


Example 3.127. 

There are exactly two 2x2 Latin squares having entries 0 and 1 : 


A = 


0 

1 


1 

0 


and 


B = 


1 0 
0 1 


◄ 


Example 3.128. 

Here are two 4x4 Latin squares. 


0 

1 

2 

3" 



0 

1 

2 

3 

1 

0 

3 

2 

and 

B = 

2 

3 

0 

1 

2 

3 

0 

1 

3 

2 

1 

0 

3 

2 

1 

0 



1 

0 

3 

2 


Example 3.129. 

The multiplication table of a finite group G = {ai, of order n, namely, 

[ctjcij], is a Latin square. Since the cancellation laws hold in groups, a, -a / = a, -a k 
implies aj = «/ ( , and so the /th row is a permutation of G; since a,ai = a/cif 
implies a, = aj, the / th column is a permutation of G. ◄ 


We are going to use the following construction, which is usually the first 
attempt of a neophyte defining matrix multiplication. 


Definition. If A = [«, , ] and B = [ b lJ ] are m x n matrices, then their Hadamard 
product , denoted by A o B, is the m x n matrix whose i j entry is the ordered 
pair ( Pij,bjj ). 
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If the entries of A and of B lie in a commutative ring R , then one often 
replaces the i j entry of the Hadamard product, (cijj , bij), by the product a,j /;,y 
in R. 

Suppose that the entries of A lie in a set X with |X| = n, and the entries of 
B lie in a set Y with Y\ = n. There are exactly n 2 ordered pairs in X x Y , and 
we say that A and B are orthogonal if every ordered pair occurs as an entry in 
their Hadamard product A o B. 


Definition. Two n x n Latin squares A = [o,y] and B = [ bij ], with entries in 
sets X and Y with \X\ = n = Y\, respectively, are called orthogonal if all the 
entries in their Hadamard product A o B, namely, all the ordered pairs (a ij , bij), 
are distinct. 

For example, there is no orthogonal pair of 2 x 2 Latin squares: as we saw in 
Example 3.127, there are only two 2x2 Latin squares with entries in X = {0, 1}, 
and their Hadamard product is 


AoB = 


01 10 
10 01 


There are only two distinct ordered pairs, not four as the definition requires. 


Example 3.130. 

The two 4x4 Latin squares in Example 3.128 are orthogonal, for all 16 ordered 
pairs are distinct. 


AoB 


00 11 22 33 
12 03 30 21 
23 32 01 10 

31 20 13 02 


Let A be a matrix whose entries lie in a set X. If a : x i-> x' is a permutation 
of X, then applying a to each entry in A yields a new matrix A'. 


Lemma 3.131. Let A = [a,y | be a Latin square whose entries lie in a set X 
with n elements. If x fa- x' is a permutation of X, then A' = [(a,y /] is a Latin 
square ; that is, if x = a\j is the i j entry of A, then x' is the i j entry of A' . 
Moreover, if A and B = [bij] are orthogonal Latin squares, then A' and B are 
also orthogonal. 

Proof. As the ith row (an, ... , a\ n ) of A is a permutation of X, so is the ith 
row ((an)', . . . , («,„)') of A' (the composite of two permutations is again a per- 
mutation). A similar argument shows that the columns of A' are permutations of 
X, and so A' is a Latin square. 
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If A' and B are not orthogonal, then two entries of A' o B are equal: say, 
(a'jj, bij ) = (a' kv bjct), so that aL = a' ki and bjj = b Since priming is a per- 
mutation, a\j = ciki- Thus, there is a repeated ordered pair in A o B, contradicting 
the orthogonality of A and B. Hence, A' and B are orthogonal. • 

Euler’s problem asks whether there is a pair of orthogonal 6x6 Latin squares 
(the first index denotes the rank and the second index denotes the regiment). 
Euler was more interested in orthogonal Latin squares than he was in officers. 
To see why he cared about the case n = 6, let us first construct some orthogonal 
pairs. 

Proposition 3.132. 

(i) Ifk is a finite field and a e k x = k — {0}, then the \k\ x |fc| matrix 

L a = = [ax + y], 

where x, y e k, is a Latin square. 

(ii) If a, b e k x and b a, then L a and Lb are orthogonal Latin squares. 
Proof. 

(i) The xth row of L a consists of the elements ax + y, where x is fixed. These 
are all distinct, for if ax + y = ax + y', then y = y'. Similarly, the yth column of 
L a consists of elements ax + y, where y is fixed, and these are distinct because 
ax + y = ax' + y implies ax = ax'. Since a 0, the cancellation law gives 
x = x'. 

(ii) Suppose that two ordered pairs coincide; say, 

(ox + y, bx + y) = (ax’ + y' , bx' + y'). 

Thus, ax + y = ax' + y' and bx + y = bx' + y'. There result equations 
o(x — x') = y' — y = b{x — x'). 

Since a f=- b, the cancellation law says that x — x' = 0, and so y' — y = 0, i.e., 
x' = x and y' = y. Therefore, L„ and L/ ; are orthogonal Latin squares. • 

Corollary 3.133. For every prime power p e > 2, there exists a pair of orthog- 
onal p e x p e Latin squares. 

Proof. By Galois’ theorem, there exists a finite field k with \k\ = p e . In order 
to have an orthogonal pair of Latin squares, we need \k x \ > 2; that is, p e — 1 > 2, 
hence p e >2. • 
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Remark. Galois invented finite fields around 1830, so that Euler, in 1782, 
constructed orthogonal p e x p e Latin squares in a different (more complicated) 
way. ◄ 

We now show how to create large orthogonal Latin squares from small ones. 
Let K and L be sets with \K\ = k and \L\ = l. If B = [ bij ] is an l x l matrix 
with entries in L, then a B is the l x l matrix whose i j entry is abij [where abjj 
abbreviates the ordered pair ( a , bij)]. If A = [ | is a k x k matrix whose entries 
lie in K, then the Kronecker product A <g) B of A and B is the kt x kt matrix 

ai\B ct 12 B ... ai(6 
aj\B CI 22 B ■■■ U2 kB 

_au\B a^B ... 

Theorem 3.134 (Euler). Ifn ^ 2 mod 4, then there exists an orthogonal pair 
of n x n Latin squares. 

Proof. We merely state the main steps of the proof. One shows first that if A 
and B are Latin squares, then A (g) B is a Latin square. Second, one proves that if 
A and A' are orthogonal k x k Latin squares, and if B and B' are orthogonal lx l 
Latin squares, then A <S) B and A' <S) B' are orthogonal kl x kt Latin squares. 
Neither of these steps is challenging. Of course, one can form the Kronecker 
product of a finite number of matrices. 

A positive integer n is odd if and only if n = 1 mod 4 or n = 3 mod 4; in 
either case, n = p j 1 • • • p e ' where the /?,- are odd primes. A positive integer 
n = 0 mod 4 if and only if n = 2 e m, where e > 2 and m is odd. Therefore, 
n f 2 mod 4 if and only if n = 2 e p e { 1 ■ ■ ■ p e f , where e f I and the p, are odd 
primes. By Corollary 3.133, there is an orthogonal pair of pf x pf Latin squares 
for each i and, if e > 2, an orthogonal pair of 2 e x 2 e Latin squares. Taking the 
Kronecker product of these gives a pair of orthogonal n x n Latin squares. • 

The smallest n not covered by Euler’s theorem is n = 6, and this is why 
Euler posed the question of the 36 officers. Indeed, he conjectured that there is 
no orthogonal pair of n x n Latin squares if n = 2 mod 4. In 1901, G. Tarry 
proved that there does not exist an orthogonal pair of 6 x 6 Latin squares, thus 
answering Euler’s question posed at the beginning of this section: there is no 
such formation of 36 officers. However, in 1958, E. T. Parker discovered an 
orthogonal pair of 10 x 10 Latin squares, thereby disproving Euler’s conjecture 
Parker’s example is displayed on the front cover of this book. Table 3.1 is a less 
colorful version of it; note that every number less that 100 appears, in decimal 
notation, as an entry. Parker, R. C. Bose, and S. S. Shrikhande went on to prove 
that there exists a pair of orthogonal n x n Latin squares for all 11 except 2 and 6. 
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00 

15 

23 

32 

46 

51 

64 

79 

87 

98 

94 

77 

10 

25 

52 

49 

01 

83 

68 

36 

71 

34 

88 

17 

20 

02 

43 

65 

96 

59 

45 

81 

54 

66 

18 

27 

72 

90 

39 

03 

82 

40 

61 

04 

99 

16 

28 

37 

53 

75 

26 

62 

47 

91 

74 

33 

19 

58 

05 

80 

13 

29 

92 

48 

31 

84 

55 

06 

70 

67 

69 

93 

35 

50 

07 

78 

86 

44 

12 

21 

57 

08 

76 

89 

63 

95 

30 

11 

24 

42 

38 

56 

09 

73 

85 

60 

97 

22 

41 

14 


Table 3.1. 


Magic 

We are now going to use orthogonal Latin squares to construct some magic 
squares. 

Definition. An n x n magic square is an n x n matrix A = [a/y ] whose entries 
consist of all the numbers 0,1 , ... ,n 2 — 1 and whose row sums and columns 
sums are the same; that is, there is a number a, called the magic number, with 

n n 

Y\ a, j = a for all i and aij = a for all j. 

7=1 i = 1 

The 1514 engraving Melencolia I, by Albrecht Diirer (see Figure 3.2) con- 
tains the following square in its upper right corner. 


16 

3 

2 

13 

5 

10 

11 

8 

9 

6 

7 

12 

4 

15 

14 

1 


Notice the date 15 14 in the bottom row. 15 The row and column sums all equal 34; 
in fact, the sum JT an of the diagonal terms is also 34, as is the sum JT 
of the terms on the back diagonal (going up from the bottom left corner to the 
top right corner). This is not a magic square, for its entries range from 1 to 16 
instead of from 0 to 15, but this is easily remedied: subtract 1 from each entry 
and get a magic square with magic number 30. 

15 D urer was familiar with Qabcda, Hebrew mysticism, in which each letter of the alphabet 
is assigned a numeric value, and each word is assigned the value which is the sum of the values 
of its letters. The values assigned to the letters of the Latin alphabet are 1,2,..., 26. Notice 
that 4 and 1 fhnk 1514 in the magic square; these are the initials of the artist D'urer, Albrecht. 
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Figure 3.2 

Melencholia I, by Albrecht D'iirer 
Grunwald Center for the Graphic Arts 
UCLA Hammer Museum 



312 


Commutative Rings I Ch. 3 


Proposition 3.135. If A is an n x n magic square, then its magic number is 

1 9 

a = 2 n(n — 1). 

Proof. If pi denotes the sum of the entries in the ith row of A, then p l = o for 
all i, and so 

n 

^2 Pi = na - 

i = 1 

But this last number is the sum of all the entries in A; that is, 

no = 1 + 2 + • • • + (n 2 — 1) = \{n 2 — 1 )n 2 . 

Therefore, o = \n{n 2 — 1). • 

If n = 4, then o = ^4 ■ 15 = 30. 

There is a minor disagreement about terminology. In order that a square be 
magic, some authors also require that the diagonal entries and the back diagonal 
entries each add up to the magic number, as in the modified Diirer square. 

Definition. A diabolic square is a magic square whose diagonal and back 
diagonal sums are each equal to the magic number. 

We will construct some diabolic squares below, but let us first return to magic 
squares. There are many methods of constructing magic squares. For example, 
in 1693, De la Loubere showed how to construct an n x n magic square, for any 
odd n, in which 0 can occur in any i j position (see Stark, An Introduction to 
Number Theory, Chapter 4). We now use orthogonal Latin squares to construct 
magic squares. 

Proposition 3.136. If A = [o ; / ] and B = [/; (; | are orthogonal Latin squares 
with entries in 0, 1, ... , n — 1, then the matrix M = [a/jn + b{j] is an n x n 
magic square. 

Proof. Since A and B are orthogonal, the entries (a/j , bjj ) of their Hadamard 
product A o B are all distinct. It follows from Proposition 1 .44, which says that 
the n-adic digits of a non-negative number are unique, that every number from 
0 through n 2 — 1 occurs in M (note that 0 < «, 7 < n and 0 < b n < n). Now 
A being a Latin square says that each row and column of A is a permutation of 
0, 1, — 1 , and so each row sum and column sum equals s = ■' ~g i = 

\(n — 1 )n; similarly, each row sum and column sum of B equals 5. Therefore, 
each row sum of M is equal to sn + s, as is each column sum. Therefore, M is a 
magic square. • 
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The magic number of M is a = s(n + 1), where s = jn(n — 1). This 
agrees with the value of the magic number in Proposition 3.135, for s(n + 1) = 
\n(n — 1 )(« + 1) = jn(n 2 — 1). 

Parker’s 10 x 10 orthogonal Latin squares have been converted into deci- 
mal digits in Table 3.1, which is an example of a 10 x 10 magic square as just 
constructed. 


Example 3.137. 

Proposition 3.136 is not the only way to construct magic squares. For example, 
here is a 6 x 6 magic square (whose magic number is, of course, 105); it is even 
diabolic. This magic square does not arise from an orthogonal pair of 6 x 6 Latin 
squares, for Tarry has shown us that there aren’t any! 


34 

0 

5 

25 

18 

23 

2 

31 

6 

20 

22 

24 

30 

8 

1 

21 

26 

19 

7 

27 

32 

16 

9 

14 

29 

4 

23 

11 

13 

15 

3 

35 

28 

12 

17 

10 


We now construct some diabolic squares from orthogonal Latin squares. It 
is known that n x n diabolic squares exist for all n > 3, but we will construct 
them for only certain n. 


Definition. An nxn Latin square A = [« (/ 1 with entries in a set A with A = n 
is a diagonal Latin square if its diagonal and its back diagonal are permutations 
of A. 


Lemma 3.138. Ifn e N is an odd integer that is not a multiple of 3, then there 
exists an orthogonal pair of n x n diagonal Latin squares. 

Proof Given n, the method we use requires positive integers a > b with each 
of a, b, a — b, and a + b being relatively prime to n. If we choose a = 2 and 
b = 1, then (2, n) = 1 forces n to be odd, b = 1 = a — b does not constrain n, 
but a + b = 3 forces n not to be a multiple of 3. 

First, we construct a diagonal n x n Latin square. It will be convenient to 
label the rows and columns so that 0 < L j < n — 1 . Define A to be the n x n 
matrix whose i j entry is the congruence class [ib + j a ] mod /? ; we simplify 
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notation by omitting the brackets of the entries. Thus, 


0 a 2a 

b b + a b + 2a 

2b 2b + a 2b + 2 a 


( n — 1 )a 
b + (n — 1 )a 
2b + (n — l)o 


in — 1 )b (n — 1 )b + a (n — 1 )b + 2 a 


( n — 1 )b + (n — \)a 


We now show that A is a diagonal Latin square; remember that its entries lie in 
I„. Each row is a permutation: for fixed i, if ib+ja = ib + j 1 a, then (j —j')a = 
0 mod n. But ( a , n) = 1 permits cancellation, so that [ / ] = [j'\. Each column 
is a permutation: for fixed j, if ib + ja = i'b + j a, then (i — i')b = 0 mod n, 
and so ( b , n) = 1 gives [/] = [/']. For the main diagonal, if ib + ia = i'b + i'a, 
then i (b +a) = i'(b + a), so that (b + a, n ) = 1 gives [/] = Finally, for the 
back diagonal, if ib + (n — i)a = i'b + in — i')a, then i (b — a) = i'(b — a), and 
so (b — a, n ) = (a — b, n) = 1 gives [/] = [/']. 

It is obvious that the transpose A T of A is also a diagonal Latin square, and 
we now show that A and A T are orthogonal. 16 Note that the i j entry of A 1 is 
jb+ia, so that the ij entry of the Hadamard product A o A 1 is (ib + ja, jb+ia). 
To check orthogonality, suppose that (; ib + ja, jb + ia ) = (i'b + j'a, j'b + i'a). 
Now ib + ja = i'b + j'a and jb + ia = j'b + i'a (remember that entries lie in 
In), so that [(* - i')a] = [( j ' - j)b ] and [(/ - j )a] = [(/ - i')b]. Multiplying 
the first equation by [fi] and the second by [a] gives [(_/' — j)a 2 ] = [(/ — i')ab] = 
[( j ’ — j)b 2 ]. Now a = 2 and b = 1, so that [4 (j' — j ) \ = [j' — j] and, hence, 
3(j' — j) = 0 mod n. But (3, n) = 1, so that j' — j = 0 mod n and [/ 1 = 

A similar argument gives [i] = [/']. • 


Proposition 3.139. If n e N is an odd integer that is not a multiple of 3, then 
there exists an n x n diabolic square. 

Proof. Let A = [a (/ ] and B = [ b, t \ be diagonal orthogonal n x n Latin squares, 
which exist, by Lemma 3.138. By Proposition 3.136, the matrix M = [aijti+bjj] 
is a magic square with magic number o =.?(/? + 1), where ,v = Yl'i=o ‘ the 
main diagonal of A and of B are permutations of {0, 1, . . . , n — 1}, the sum of 
the diagonal terms is s(n + 1), and the same is true of the back diagonal. • 


Fertilizer 


16 If A is a Latin square, it is not always true that A and pJ are orthogonal. For example, 
the 4x4 Latin square A in Example 3.130 has all 0’s on its main diagonal, and so does its 
transpose. Since all the diagonal entries of A o A 1 equal (0, 0), the Latin squares A and A 1 
are not orthogonal. 
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Here is a fertilizer story that will ultimately be seen to be related to Latin 
squares. To maximize his corn production, a farmer has to choose the best type 
of seed. But he knows that the amount of fertilizer also affects his crop. How 
can he design an experiment to show him what is the best combination? We give 
a simple illustration. Suppose there are three types of seed: A, B, and C. To 
measure the effect of using different amounts of fertilizer, the farmer can divide 
a plot into 9 subplots, as follows: 


Amount of Fertilizer 


Seed Type 


High 

A 

B 

C 

Medium 

A 

B 

C 

Low 

A 

B 

C 


In each position, an observation x s f is made, where x s f is the number of ears 
harvested according to the seed type s and level / of fertilizer. 

The farmer now wants to see the effect of differing dosages of pesticide. He 
could have 27 observations x s / p (more generally, if he had n different dosages 
and n different seed types, there would be n 3 observations). On the other hand, 
suppose he arranges his experiment as follows (again, we illustrate with n = 3). 


Amount of Fertilizer 

High 

Amount of Pesticide 
Medium 

Low 

High 

A 

B 

C 

Medium 

C 

A 

B 

Low 

B 

C 

A 


The seed types are now arranged in a Latin square. For example, the observation 
from the northwest subplot is the number of ears from seed type A, with a high 
level of fertilizer, and high level of pesticide. There are only 9 observations in- 
stead of 27 (more generally, there are n 2 observations instead of n 3 ). Obviously 
we do not have all possible observations. To infer properties about a large col- 
lection from measuring a small sample is what statistics is all about. And it turns 
out that the Latin square organization of data gives essentially the same statistical 
information as that given by the complete set of all « 3 observations. A discussion 
of the analysis of variance for such designs can be found, for example, in Li, An 
Introduction to Experimental Statistics. 

The farmer now wants to consider water amounts. Again we illustrate with 
n = 3. In addition to the seed types A, B, C, and the various levels of fertilizer 
and pesticide, let there be three water levels: a > ft > y. 
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Amount of Fertilizer 

High 

Amount of Pesticide 
Medium 

Low 

High 

Aa 

Bf 

Cy 

Medium 

cp 

Ay 

Ba 

Low 

By 

Ca 

Ap 


The observation from the northwest subplot, for example, is the number of 
ears of seed type A, high level of fertilizer, high level of pesticide, and high level 
of water. Again, the statistical data arising from this small number, namely 9, of 
observations are essentially the same as what one would get from 8 1 observations 
Xsfpw (more generally, n 2 observations instead of n 4 ). Euler called such matrices 
Graeco-Latin squares, because he described them, as above, using Latin and 
Greek fonts; he coined the term Latin square for this same notational reason. We 
recognize an orthogonal pair of Latin squares. One could test more variables if 
one could find an orthogonal set of Latin squares as defined below. 

Definition. A set A\, Ai, A, of n x n Latin squares is an orthogonal set 
if each pair of them is orthogonal. 

Lemma 3.140. If A\, A 2 , . . . , A t is an orthogonal set ofn x n Latin squares, 
then t < n — 1 . 

Proof. There is no loss in generality in assuming that each A v has entries lying 
in X = {0, 1, . . . , n — 1}. Permute the entries of A \ so that its first row is 
0, 1, . . . , n — 1 in this order. By Lemma 3.131, this new matrix A' x is a Latin 
square which is orthogonal to each of A 2 , . . . , A,. Now permute the entries of 

A 2 so that its first row is 0. I n — I in this order. This new matrix A' 2 is a 

Latin square, and it is orthogonal to each of A' v A 3 , . . . , A t . Continuing in this 
way, we may assume that the top row of each A v is 0, 1 , . . . , n — 1 in this order. 
If v 7 ^ X, then the first row of A v o Ax, their Hadamard product, is 

( 0 , 0 ), ( 1 , 1 ) (/* — 1 ,« — 1 ). 

We claim that A v and A, do not have the same 2, 1 entry. Otherwise, there is 
some k with a 21 = k = a f (where of denotes the i j entry of A v ,) so that 

( a 2 i ’ fl 2 i) = k). 

This contradicts the orthogonality of A v and A,, for the ordered pair (k. k) al- 
ready occurs in the first row of A v o /l> as (a L , a \, ). Therefore, distinct A v have 
distinct entries in the 2, 1 position. In any A v , however, there are only n — 1 
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choices for its 2, 1 entry, because 0 already occurs in its 1,1 position, and so 
there are at most n — 1 distinct A v ,’s. • 

Definition. A complete orthogonal set of n x n Latin squares is an orthogonal 
set of n — 1 Latin squares. 

Theorem 3.141. Ifq = p e , then there exists a complete orthogonal set of q — 1 
q x q Latin squares. 

Proof. If k is a finite field with q elements, then there arc q — I elements a e k x , 
and so there are q — 1 Latin squares L a , each pair of which is orthogonal, by 
Theorem 3.132. • 

One Latin square can test two variables (e.g., levels of fertilizer and pesti- 
cide) on different varieties (e.g., of seed). A Graeco-Latin square, i.e., a pair of 
orthogonal Latin squares, allows testing for a third variable (e.g., levels of wa- 
ter). More generally, a set of t orthogonal Latin squares allows one to test levels 
of t + 1 different variables on different varieties. 


Horizons 

And now another stream enters the story. By the early 1800s, mathemati- 
cians were studying the problems of perspective arising from artists painting 
pictures of three-dimensional scenes on two-dimensional canvases. To the eye, 
parallel lines seem to meet at the horizon, and this suggests adjoining a new con- 
struct, a “line at infinity,” to the ordinary plane. Every line is parallel to a line 
i passing through the origin (). For each such line, define a new point, cot. and 
“lengthen” every line parallel to i by adjoining this new point to it. Finally, we 
decree that all the new points a>i, for all lines i through the origin, comprise a 
new line, the line at infinity , or the horizon. If i\ and h_ are (lengthened) paral- 
lel lines, and if i is the line through O parallel to each of them, then 1 1 and to 
intersect in the point u>i. The reader may check that the familiar property that 
every two points determine a unique line 17 is now accompanied by the property 
that every two lines determine a unique point: it is the usual point of intersection 
if the lines are not parallel, and it is a point cot on the horizon if the lines are 
parallel. 

Since we are now interested in finite structures, let us replace the plane 
M x 1 by a finite “plane” k x k, where k is a finite field with q elements. 
We regard this finite plane as the direct product. Define a line l through the 
origin O = (0, 0) to be a subset of the form 

l7 This is the reason we do not adjoin a new point to lines which are not parallel. 
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i = {(ax, ay) : a e k and (x, _y) 7^ O }, 

and, more generally, define a line to be a coset 

(h, v) + i = {(n + ax, v + ay) : a e £} . 

Since k is finite, we can do some counting. There are q 2 points in the plane, 
and there are q points on every line. As usual, two points determine a line. Call 
two lines parallel if they do not intersect, and say that two lines have the same 
direction if they are parallel. How many directions are there? Every line, being a 
coset of a line i through the origin, has the same direction as l, whereas distinct 
lines through the origin have different directions, for they intersect. Thus, the 
number of directions is the same as the number of lines through the origin. There 
are q 2 — 1 points V 7^ O, each of which determines a line i = OV through the 
origin. Since there are q points on i, there arc q — I points on i other than O , 
and each of them determines t. There are thus 

(q 1 -\)/(q-\) = q + \ 

directions. We adjoin q + 1 new points oj, to k x k, one for each direction; that 
is, one for each line i through the origin. Define co, the line at infinity, by 

co = { o>f : t is a line through the origin}, 

and define the projective plane over k: 

P(k) = (k x k) U co. 

Define a (projective) line in P(k ) to be either co or an old line (u, v) + i in k x k 
with the point a>£ adjoined, where l is a line through the origin. It follows that 
\P (k)\ = q 2 + q + \, every line has q + 1 points, and any two points determine 
a unique line. (In Example 4.26, we shall use linear algebra to give another 
construction of a projective plane.) 

Example 3.142. 

If k = F2, then k x k has 4 points: O = (0, 0), a = (1, 0), b = (0, 1), and 
c = (1, 1), and 6 lines, each with two points, as in Figure 3.3. 

There are three sets of parallel lines: Oa and be, Ob and ac, and Oc and ah. 
The projective plane E(F2 ) is obtained by adding new points 0 )\ , o> 2 - 013 and 
forcing parallel lines to meet. There are now 7 lines: the 6 original lines (each 
lengthened) and the line at infinity {n>i , on , on } . A 

We now abstract the features we need. 
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Figure 3.3 Affi ne Plane Figure 3.4 Projective Plane 


Definition. A projective plane of order n is a set X with X = n 2 + n + 1, 
a family of subsets called lines, each having n + 1 points, such that every two 
points determine a unique line. 

We have seen above that if k is a finite field with q elements, then P (k) is a 
projective plane of order q. It is possible to construct projective planes without 
using finite fields. For example, it is known that there are four projective planes 
of order 9, only one of which arises from the finite field with 9 elements. 

The following theorem is the reason we have introduced projective planes. 

Theorem. Ifn > 3, then there exists a projective plane of order n if and only 
if there exists a complete orthogonal set of n x n Latin squares. 

Proof. See Ryser, Combinatorial Mathematics, p. 92. • 

A natural question is to find those n for which there exists a projective plane 
of order n. Notice that this is harder than Euler’s original question; instead of 
asking whether there is an orthogonal pair of n x n Latin squares, we are now 
asking whether there is an orthogonal set of n — 1 n x n Latin squares. If n = p e , 
then we have constructed a projective plane of order n above. Since Tarry proved 
that there is no orthogonal pair of 6 x 6 Latin squares, there is no set of 5 pairwise 
orthogonal 6x6 Latin squares, and so there is no projective plane of order 6. 
The following theorem was proved in 1949. 

Theorem ( Bruck-Ryser ). If either n = 1 mod 4 orn = 2 mod A and, further, 
if n is not a sum of two squares, then there does not exist a projective plane of 
order n. 

Proof. Sec Ryser, Combinatorial Mathematics, p. III. • 
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The first few n = 1 or 2 mod 4 with n > 3 are 

5, 6, 9, 10, 13, 14, 17, 18, 21, 22. 

Some of these are primes or prime powers, and so they must be sums of two 
squares 18 because projective planes of these orders do exist; and so it is: 

5=1+4; 9 = 0 + 9; 13 = 4 + 9; 17=1 + 16. 

Of the remaining numbers, 10 = 1+9 and 18 = 9 + 9 are sums of two squares 
(and the theorem does not apply), but the others are not. It follows that there is 
no projective plane of order 6, 14, 21, or 22 (thus, Tarry’s result follows from the 
Bruck-Ryser theorem). 

The smallest n not covered by the Bruck-Ryser theorem is n = 10. The 
question whether there exists a projective plane of order 10 was the subject of 
much investigation (after Tarry, 10 was also the first open case of Euler’s con- 
jecture). This is a question about a set with 111 points, and so one would expect 
that a computer could solve it quickly. But it is really a question about 11- 
point subsets of a set with 1 1 1 points, the order of magnitude of which is the 
binomial coefficient ( ^ ), a huge number. In spite of this, C. Lam was able 
to show, in 1988, that there does not exist a projective plane of order 10. He 
used a massive amount of calculation: 19,200 hours on VAX 11/780 followed 
by 3000 hours on CRAY- IS. Thus, two and half years of actual computer run- 
ning time (not counting the years of human thought and ingenuity involved in 
instructing the machines) solved the problem. As of this writing, it is unknown 
whether a projective plane of order 12 exists (12 = 0 mod 4, and so it is not 
covered by the Bruck-Ryser theorem). 


18 Recall Theorem 3.83, Fermat’s two-square theorem: if p = I mod 4, then p is a sum of 
two squares. Since there exists a projective plane of order p, the Bruck-Ryser theorem implies 
the two-square theorem. In fact, the Bruck-Ryser theorem implies that if p = 1 mod 4, then 
p e is a sum of two squares for all e > 1 . 
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4.1 Vector Spaces 

Linear algebra is the study of vector spaces and their homomorphisms, with 
applications to systems of linear equations. From now on, we are going to as- 
sume that most readers have had some course involving matrices, probably with 
real entries or with complex entries (we have already discussed elementary facts 
about 2x2 matrices in some detail). Nowadays, such courses usually deal with 
computational aspects of the subject, but here we do not emphasize this impor- 
tant aspect of linear algebra. Instead, we discuss more theoretical properties of 
vector spaces (with scalars in any field) and linear transformations (which are 
homomorphisms between vector spaces and which are concretely described by 
matrices). When we discuss codes in Section 4.5, you will see how linear al- 
gebra with scalars in finite fields is used in an essential way to enable us to see 
photographs sent from outer space. 

Definition. If k is a field, then a vector space over k is an (additive) abelian 
group V equipped with a scalar multiplication : There is a function k x V V, 
denoted by (a, v) a v, such that, for all a, b e k and all u,ve V, 

(i) a( u + v) = an + av; 

(ii) (a + b)v = av + bv, 

(iii) (ab)v = a(bv); 

(iv) I v = v, where 1 is the one in k. 

Besides the 5 axioms explicitly mentioned [scalar multiplication is defined 
plus axioms (i) through (iv)], there are several more axioms implicit in the state- 
ment that a vector space is an abelian group under addition. There is a function 
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V x V — > V, denoted by (u , v ) u + v, satisfying the following equations for 
all u, v, w e V\ 

(i) ' (u + v) + w = u + (v + w); 

(ii) ' u + v = v + u ; 

(iii) ' there is 0 e V with 0 + v = v; 

(iv) ' for each v e V, there is v' e V with v + v' = 0. 

Thus, the definition of vector space involves ten axioms. 

Elements of V are called vectors 1 and elements of k are called scalars. 

Example 4.1. 


(i) Euclidean space M" is a vector space over BL Vectors are n -tuples v = 
(a\ , . . . , a„), where a,- e M for all i. Picture a vector v as an arrow from 
the origin to the point having coordinates (ai, . . . , a n ). Addition is given 
by 

(a\, . . . , a„) + (b i b n ) = (a\+b\ ,a n + b n )\ 

geometrically, the sum of two vectors is described by the parallelogram 
law. 

If c e M, then scalar multiplication by c is given by 

cv = c(a\, . . . , a n ) = (ca \, . . . , ca n ). 

Scalar multiplication v cv “stretches” v by a factor |c|, reversing its 
direction when c is negative (we put quotes around stretches because cv is 
shorter than v when |c| < 1). 

(ii) The example in part (i) can be generalized. If k is any field, define V = k n , 
the set of all /(-tuples v = (a \ .... , a n ), where a, e k for all i. Addition 
and scalar multiplication by c e k are given by the same formulas as in (i): 

(at, • • • , a „ ) + {b i b n ) = (a\+b\ ,a n + b n )\ 

c{a\ a „ ) = (cai, . . . , ca„). 

(iii) If R is a commutative ring and k is a subring that is a field, then R is a 
vector space over k. Regard the elements of R as vectors and the elements 

*The word vector comes from the Latin word meaning ‘to carry”; vectors in euclidean 
space carry the data of length and direction. The word scalar comes front regarding v \ cv 
as a change of scale. The terms scale and scalar come from the Latin word meaning ‘ladder,” 
for the rungs of a ladder are evenly spaced. 
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of k as scalars; define scalar multiplication c v, where c e k and v € R, to 
be the given product of two elements in R. Notice that the axioms in the 
definition of vector space are just particular cases of some of the axioms 
holding in the commutative ring R. 

For example, if k is a field, then the polynomial ring R = k[x) is a 
vector space over k. Vectors are polynomials fix), scalars are elements 
c e k, and scalar multiplication gives the polynomial c/(x); that is, if 

fix) = b n x n H 1- b\x + bo, 


then 


c/(x) = cb n x n + • • • + cb\x + cbo- 


In particular, if a field k is a subheld of a larger held E, then £ is a vector 
space over k. For example, C is a vector space over M. 

(iv) If k is a held, let Mat mx „ (k) denote the set of all m x n matrices having 
entries in k. Dehne the sum A + B of two matrices A and B by adding 
entries in the same position: if A = [ajj] and B = [bjj], then 


A + B — [aij + bjj). 


If c e k, then multiplying each entry of A = [a ( / ] by c gives 


cA = [ caij ]. 

It is routine to check that Mat mx «(£) is a vector space over k. 

If m = n. then we write Mat,, (k) instead of Mat„ x „ (k). A 

A subspace of a vector space V is a subset of V that is a vector space under 
the addition and scalar multiplication in V. However, we give a simpler defini- 
tion that is more convenient to use. 


Definition. If V is a vector space over a held k, then a subspace of V is a subset 
U of V such that 

(i) 0 et/; 

(ii) it , u' e U imply u + u' £ U; 

(iii) u e JJ and c e k imply cm e U . 

Proposition 4.2. Every subspace U of a vector space V over a field k is itself 
a vector space over k. 
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Proof. By hypothesis, U is closed under scalar multiplication: if it e U and 
c e k, then cu e JJ . Axioms (i) through (iv) in the definition of vector space 
hold for all scalars and for all vectors in V; in particular, they hold for all vectors 
in U . For example, axiom (iii) says that ( ab)v = a(bv) holds for all a. b € k 
and all v £ V; in particular, this equation holds for all u e U. 

By hypothesis, U is closed under addition: if u, u' € U, then u + u' e JJ . 
Axioms (i)’ through (iv)’ in the definition of vector space hold for all scalars and 
for all vectors in V; in particular, they hold for all vectors in U. Finally, axiom 
(iii)’ requires 0 e U, and this, too, is part of the hypothesis. • 

Example 4.3. 

(i) The extreme cases U = V and U = {0} (where {0} denotes the subset 
consisting of the zero vector alone) are always subspaces of a vector space. 
A subspace U C V with U i=- V is called a proper subspace of V; we may 
write U C V to denote U being a proper subspace of V . 

(ii) If v = (a i, . . . , a n ) is a nonzero vector in R' 1 , then 

i = {av : a el) 

is a line through the origin, and £ is a subspace of R" . For example, the 
diagonal {(a, a) : a e M) is a subspace of the plane K 2 . 

Similarly, a plane through the origin consists of all vectors of the form 
a v i + b i> 2 , where v\, V 2 is a fixed pair of noncollinear vectors, and a, b 
vary over BL It is easy to check that planes through the origin are subspaces 
of M" . 

By Proposition 4.2, lines and planes through the origin are vector spaces; 
without this proposition, one would be obliged to check each of the ten ax- 
ioms in the definition of vector space. 

(iii) If m < n and is regarded as the set of all those vectors in Si" whose 
last n — m coordinates are 0, then M m is a subspace of M' ! . For example, 
we may regard M 1 = R as all points (x, 0) in M 2 ; that is, R can be viewed 
as the real axis in the plane. 

(iv) If k is a field, then a linear system over k of m equations in n unknowns is 
a set of equations 

a\\X\ H \~a\ n x n = b\ 

a 2 l*l H 1 - 02 n x„ = b 2 




riXyi — bn 
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where a jj . bj e k. A solution is a vector ,v = (at , s n ) e k" with 
Y2 j cijjSj = bj for all i . The set of all solutions is a subset of k" , called the 
solution set of the system; if there are no solutions, one calls the system 
inconsistent. A linear system is homogeneous if all the bj are 0. Since the 
zero vector is always a solution of a homogeneous linear system, a homo- 
geneous system is always consistent. A solution a = (ji, . . . , s n ) is called 
nontrivial if some sj f 0. The set of all solutions of a homogeneous linear 
system forms a subspace of k n , called the solution space (or nullspace ) of 
the system. 

These definitions can be written more compactly in matrix notation. 
The coefficient matrix of the system is A = [«,, I- If b is the column 
vector b = {b \ , . . . , b m ), then s is a solution if and only if As = b. 

To see that the solution space is a subspace, let Ax = 0 be a homoge- 
neous system, and let JJ be its solution space. Now 0 e U because AO = 0. 
If u, u' e U, then An = 0 = Am', and so A{u + u') = Au + Au' = 0 + 0 = 
0; hence, u + u' e (/.If c e k and u e U, then A(cu) = c(Au ) = c0 = 0, 
and so cm e JJ . Therefore, U is a subspace of k n . 

We can solve systems of linear equations over the field F /; , where p is 
a prime; that is, we can treat a system of congruences mod p just as we 
treat an ordinary system of equations over BL 

For example, the system of congruences 

3x — 2y + z = 1 mod 7 
x + y — 2z = 0 mod 7 
— x + 2y + z = 4 mod 7 

can be regarded as a system of equations over the field F 7 . This system 
can be solved just as in high school, for inverses mod 7 are now known: 
[2] [4] = [1]; [3] [5] = [1]; [ 6 ] [ 6 ] = [1]. The solution is 

(x, y, z) = ([5], [4], [1]). 

(v) Recall that the the transpose of an m x n matrix A = [«,, ] is the n x m 
matrix A T whose i j entry is ajj. The basic properties of transposing are: 

(A + B) t = A t + B t ; (cA) t = cA T \ 

(AB) t = B t A t \ (A t ) t = A. 

We show that the set S of all symmetric n x n matrices is a subspace of 
Mat,, (k), where A e Mat,, (A:) is symmetric if A 7 = A. If 0 denotes the 
matrix all of whose entries are 0, then 0 1 = 0, and so 0 e S. If A, B e S, 
then 

(A + B) t = A t + B t = A + B. 
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so that A + B e S. Finally, if c ek and A e S, then 
{cA) t = c(A t ) = cA, 

so that cA e S. Therefore, S is a subspace of Mat,, (k). By Proposition 4.2, 
the set of all n x n symmetric matrices with entries in a field k is a vector 
space. ◄ 

The following type of square matrix is important. 


Definition. An m x m matrix A is nonsingular if there exists an m x m matrix 
B such that AB = I and B A = I. One calls B the inverse of A and denotes it 
by A" 1 . 

Recall that the dot product of two vectors v = ( a , b. c). v' = (a', //, c') in 
R 3 is defined as 

v ■ v' = aa ' + bb’ + cc' € R. 

There is a geometric interpretation of this number: 

v ■ v' = ||u|| ||u' || cos6>, 

where ||u|| is the length of v and 0 is the angle between u and \r . It follows that 
if v ■ v' = 0, then either v = 0, u' = 0, or that u and v' are orthogonal 2 . We can 
adapt dot product to more general spaces. 


Definition. If k is a field and V is a vector space over k. then an inner product 
on V is a function /: V x V — > k, usually denoted by f(v, w) = (u, w), such 
that 

(i) (u, w + w') = (v, w) + (u, w') for all v, w, w' e V; 

(ii) (u, aw) = a(v, w) for all v, w e V and a ek\ 

(iii) (u, w ) = (w, u) for all v, w e V. 

An inner product is called nondegenerate (or nonsingular ) if, for all v e V , 
(u, u) = 0 implies v = 0. 


Example 4.4. 


“In Greek, ortho means ‘fight” and gon means ‘hngle.” Thus, orthogonal means right 
angled or perpendicular. 
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(i) Let k be any field, let V = k n , and let v = (a 1 , . . a„), v' = (a[, ... , a' n ) 
lie in V . Then 

( v , v') = a\a\ + • • • + a n a' n 

is an inner product on k n . If k = M. then this inner product is nondegen- 
erate, for if JT af = 0, then each a, = 0. However, if k = C, then this 
inner product is degenerate (not nondegenerate). For example, if n = 2 
and v = (1, i), then (v, v) = 1 + i 2 = 0. One usually repairs this for 
vector spaces V = C" by defining (v, v') = a j~u'j, where a is complex 
conjugate. This does not give an inner product [because axiom (ii) of the 
definition may not hold: ( v , aw) = a(v, in)], but it does give (v, v) = 0 
implies v = 0. 

The same phenomenon can occur for inner products defined on vector 
spaces over a finite field k. For example, let k = F 2 ; if n is even and v = 
( 1 , 1 , . . ., 1 ) e k n , then (v, v) = 0; if n is odd and v = (0, 1 , 1 , . . ., 1 ), 
then (v, v ) = 0. 

(ii) Let k be a field, and regard a vector in k" as an n x 1 column matrix. If A 
is an n x n symmetric matrix with entries in k, define an inner product on 
V = k n by 

( v , w ) = v Aw. 

The reader may prove that this is an inner product, and that it is nondegen- 
erate if and only if A is a nonsingular matrix. < 

We now use inner products to construct some subspaces. 

Example 4.5. 

Let V be a vector space with an inner product, and let W C V be a subspace. 
Define 

w 1 = [v G V : (w, v) = 0 for all w € W}. 

Let us check that W 1 - (pronounced W perp) is a subspace. Clearly, 0 e W^. 
If v, v' e W ■*“, then (in, v) = 0 and (in, v') = 0 for all in e W. Hence, 
(in, v + v') = (in, n) + (in, v') = 0 for all in e W, and so v + v' e W ± . Finally, 
if v e W 1 - and a e k, then (in, av) = a(w , v) = 0, so that av e W L . Therefore, 
W 1 is a subspace of V; it is called the orthogonal complement of W . to remind 
us that (n, in) = 0 in euclidean space does imply that u and v are orthogonal 
vectors. It is easy to see that W Cl W ^ = {0} if and only if the inner product is 
nondegenerate. < 

Dimension is a rather subtle idea. One thinks of a curve in the plane, that 
is, the image of a continuous function / : R — R 2 , as a one-dimensional subset 
of a two-dimensional space. Imagine the confusion at the end of the nineteenth 
century when a “space-filling curve” was discovered: there exists a continuous 
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/: R — > R 2 with image the whole plane! We will give a way of defining di- 
mension that works for analogs of euclidean space called vector spaces (there 
are topological ways of defining dimension of more general spaces). 

The key observation in getting the “right” definition of dimension is to un- 
derstand why R 3 is 3-dimensional. Every vector (x, y, z) is a linear combination 
of the three vectors e\ = (1, 0, 0), e 2 = (0, 1, 0), and e 3 = (0, 0, 1); that is, 

(x, y, z ) = xe\ + ye 2 + ze 3. 

It is not so important that every vector is a linear combination of these specific 
vectors; what is important is that there are three of them, for it turns out that three 
is the smallest number of vectors with the property that every vector is a linear 
combination of them. 

Definition. A list 3 in a vector space V is a finite sequence X = iq , . . . , v n of 
vectors in V, where n e N. In particular, we allow n = 0 and the empty list 
which has no terms. 

More precisely, we are saying that a list X = iq , . . . , v n is a function 
<p: {1,2 ,..., n] -> V, 

for some n e N, with (p (i ) = Vj for all i. Note that X is ordered in the sense 
that there is a first vector iq, a second vector vi, and so forth. A vector may 
appear several times on a list; that is, (p need not be injective. The empty list <p 
has im <p = 0. 

Definition. Let V be a vector space over a field k. A linear combination of a 
list m, . . . , v n in V is a vector v of the form 


v = a\V\ + ■ ■ ■ + a n v n , 

where n e N and a,- e k for all i . We define the linear combination of the empty 
list to be 0, the zero vector. 

Definition. If X = iq, . . . , v m is a list in a vector space V, then 

(X) = (iq, . . . , v m ) 

is the set of all the linear combinations of iq, . . . , v m \ it is called the subspace 
spanned by X. We also say that iq, . . . , v m spans (tq, . . . , v m ). 

3 Actually, a list X = a \ , . . . , a n is exactly the same thing as an n-tuple (aj , . . . , a n ). We 
write n-tuples with parentheses to conform to standard notation. 
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Example 4.6. 

If A is an m x n matrix over a field k, then its row space Row (A) is the sub- 
space of k n spanned by the rows of A. The column space Col(A) of A is the 
subspace of k' n spanned by the columns of A. Note that Row(A) = Col(A T ) 
and Col(A) = Row(A T ), for the columns of A are the rows of A 1 (and the rows 
of A are the columns of A T ). 

If A is an m x n matrix, its row space Row(A), its column space Col(A), 
and the solution space Sol (A) of Ax = 0 are related. It can be shown that 
if the usual inner product on k n is nondegenerate, then Row (A) 1 - = Sol (A), 
Col(A) 1 - = Sol(A T ), and 5o/(A)~ L = Row(A) (see Leon, Linear Algebra with 
Applications, pages 242-244). ◄ 

Proposition 4.7. IfX = v \ , . . . , v m is a list in a vector space V, then (X) is a 
subspace of V containing the subset (i>i , . . . , v m }. 

Proof. Let us write L = (v\ , . . . , v m ) . Now 0 e L, for 


0 — Oui T • • • T 0v m . 


If u = a\v\ + • • • + a m v,„ and v = b\v\ + • — f b m v m e L, then 


u + v — ai ui + • • • + a n v n + b\ vi + • • • + b m v m 

= a iui +b\v\ \~a m v m + b m v m 

= (ai + b i)i>i + • • • + (a m + b m ) v m £ L . 


Finally, if c e k, then 


c(aiv\ H h a m v m ) = (ca\)vi H h ( ca m )v m e L. 

Therefore, L is a subspace. 

To see that each a e L, choose the linear combination having a, = 1 and 
all other coefficients 0. • 

If X = v \, . . . , v n is a list in a vector space V , then its underlying set is the 
subset {r>i, . . . , v n }. Note that v\, tb, ^3 and V 2 , v\, 1)3 are distinct lists having 
the same underlying set. Moreover, v \ , v? . ib and v \ , i>2 are also distinct lists 
having the same underlying set. One reason we are being so fussy about lists and 
underlying sets can be found in our discussion of coordinates on page 334. 


Lemma 4.8. If X = t>i , . . . , v n is a list in a vector space V, then (X) depends 
only on its underlying set. 
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Proof. If o e S n is a permutation, then define a list X a = v a (j) , . . . , v„( n y 
Now a linear combination of X is a vector v = a \ v\ + ■ ■ • + a n v n . Since addition 
in V is commutative, v is also a linear combination of the list X a . Therefore, 

(X) = ( X a ), for both subsets are comprised of the same vectors. 

If the list X has a repetition; say, ?;,■ = vj for some i f j , then 

01 V 1 + • • • + a n v n = + • • • + (o,' + a j)Vi + • • • + Vj + • • • + a n v n , 

where a\v\ + • • • + Vj + • • • + a n v n is the shorter sum with vj deleted. It 

follows that the set of linear combinations of X is the same as the set of all linear 

combinations of the shorter list obtained from X by deleting vj . • 

We now extend the definition of (Y) to arbitrary, possibly infinite, subsets 
Y c V. 

Definition. If Y is a subset of a vector space V, then (T) is the set of all linear 
combinations of lists i>i , . . . , v n , for n € N, whose terms all lie in Y . 

If Y is finite, then Lemma 4.8 shows that this definition coincides with our 
earlier definition of (Y) on page 328. 

Lemma 4.9. Let V be a vector space over a field k. 

(i) Every intersection of subspaces ofV is itself a subspace. 

(ii) If Y is a subset of V, then (Y) is the intersection of all the subspaces of V 
containing Y. 

(iii) IfY is a subset of V , then (Y) is the smallest subspace of V containing Y ; 
that is, ifU is any subspace of V containing Y, then (Y) C U . 

Proof. 

(i) Let <5 be a family of subspaces of V, and denote Pises $ by W- Since 0 e S 
for every S £ S, we have 0 e If. If x, y e W . then x, y e S for every 5 e as 
S is a subspace, we have x + y e 5 for all S e S, and so if y e W . Finally, 
if x e W . then x e S for every S e S: if c e k. then cx e S for all .S', and so 
cx e W. Therefore, W is a subspace of V. 

(ii) Let S' denote the family of all the subspaces of V containing subset Y. We 
claim that 

= n 5 - 

SeS > 

The inclusion C is clear: if i>i, . . . , v n is a list with each e Y and t CiVi e 

( Y ) , then 'ffj c^i e S for every S e S', because a subspace contains the linear 
combinations of any list of its vectors. (This argument even holds if Y = 0, 
for then (Y) = {0}.) The reverse inclusion follows from a general fact about 
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intersections: for any So £ S', we have PlseS' $ — ^0- In particular, So = < Y } £ 
S', by Proposition 4.7. 

(iii) A subspace U containing Y is one of the subspaces S involved in the inter- 
section (Y) = PlseS' • 

Were all terminology in algebra consistent, we would call (Y) the subspace 
generated by Y. The reason for the different terms is that the theories of groups, 
rings, and vector spaces developed independently of each other. 

Example 4.10. 

(i) Let V = ffi 2 , let e\ = (1, 0), and let ej_ = (0, 1). Now V = {e\, e 2 ), for if 
v = (a, b) £ V, then 

v = (a, 0) + (0, b ) 

= a(\, 0) + b(0, 1) 

= ae 1 + be 2 £ {e\, eY). 

(ii) If k is a field and V = k" , define e, as the « -tuple having 1 in the ith coor- 
dinate and 0’s elsewhere. The reader may adapt the argument in part (i) to 
show that e\, ... ,e n spans k n . 

The list e\, ... ,e n is called the standard basis of k" . Note, in k" . that 
(a 1 , . . . , a„) = a\e\ + • • • + a n e n . 

(iii) A vector space V need not be spanned by a finite sequence. For example, 

let V = k[x], and suppose that X = f\{x), . . ., f m (x) is a finite list in V. 
If d is the largest degree of any of the fj(x), then every (nonzero) linear 
combination of JT fl, /, (x), where a-, £ k, has degree at most d . Thus, 
x tl+ 1 is not a linear combination of vectors in X, and so X does not span 
k[x\. ◄ 

The following definition makes sense even though we have not yet defined 
dimension. 

Definition. A vector space V is called finite-dimensional if it is spanned by a 
finite list; otherwise, V is called infinite-dimensional. 

Example 4.10(ii) shows that k" is finite-dimensional, while part (iii) of this 
Example shows that k[x ] is infinite-dimensional. By Example 4. 1 (iii), both M 
and C are vector spaces over Q, and each is infinite-dimensional. 

Given a subspace U of a vector space V, we seek a list X which spans JJ . 
Notice that JJ can have many such lists; for example, if X = v 1 , vi . . . . , v m 
spans U and u is any vector in U, then v\, V 2 , ■ ■ ■ , v m , u also spans U . Let us, 
therefore, seek a shortest list that spans U. 
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Definition. A list X = tq , . . . , v m in a vector space V is a shortest spanning 
list (or a minimal spanning list) if no proper sublist iq ..... iq ... . v m spans 
(ui, . . . , v m ) c V. 

Proposition 4.11. If V is a vector space, then the following conditions on a list 
X = vi, , v m spanning V are equivalent'. 

(i) X is not a shortest spanning list; that is, a proper sublist spans (X). 

(ii) some Vi is in the subspace spanned by the others; that is, 

Vi ^ (l>l , . . . , Vj , . . . , v m ), 

(iii) there are scalars a i, . . . , a m , not all zero, with 

m 

J2 a i v J = °- 

7=1 

Proof, (i) =$■ (ii). If X is not a shortest spanning list, then one of the vectors in 
X, say, Vi, can be thrown out, and i >,• e (v\, ... ,v~i, ... , v m ). 

(ii) =4- (iii). If Vj = YLjyi c j v j’ then define a,- = — 1 f 0 and aj = cj for all 
j l - i- 

(iii) =$■ (i). The given equation implies that one of the vectors, say, Vj, is a linear 
combination of the others, say. 


Vi = l a jVj . 

j¥=i 

Deleting Vj gives a shorter list, which still spans: if v e V, then we know that 
v = Yl"j=i bj v j> f° r the list v \, . . . , v m spans V. We rewrite: 

v = bjVj +J2 b .i v t 
j ¥=i 

= bj (Y^ af'ajVj) + 7^ bj v j € (ui, . . . , 15 / , . . . , v m ). • 

J¥=i i¥=i 

Definition. A list X = iq, . . . , v m in a vector space V is linearly dependent if 
there are scalars a\, . . ., a m , not all zero, with Yl'j=i a j v j = 0; otherwise, X is 
called linearly independent. 

The empty set 0 is defined to be linearly independent (we interpret 0 as a 
list of length 0). 
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Example 4.12. 


(i) Any list X = v v m containing the zero vector is linearly dependent. 

(ii) A list V\ of length 1 is linearly dependent if and only if v\ = 0; hence, a 
list v\ of length 1 is linearly independent if and only if v i 0. 

(iii) A list v \ , vi is linearly dependent if and only if one of the vectors is a scalar 
multiple of the other. 

(iv) If there is a repetition on the list vi, . . . , v m (that is, if Vj = Vj for some 
i f j), then v \, . . . , v m is linearly dependent: define c, = 1 , cj = — 1, 
and all other c = 0. Therefore, if v i , . . . , v m is linearly independent, then 
all the vectors Vj are distinct. ◄ 

The contrapositive of Proposition 4.1 1 is worth stating. 

Corollary 4.13. If X = in .... , v m is a list spanning a vector space V, then X 
is a shortest spanning list if and only if X is linearly independent. 

Linear independence has been defined indirectly, as not being linearly depen- 
dent. Because of the importance of linear independence, let us define it directly. 
A list X = v \ , . . . , v, n is linearly independent if, whenever a linear combination 
Y^j=i a j v j = then every aj = 0. Informally, this says that every “sublist” 
of a linearly independent list is itself linearly independent (this is one reason for 
decreeing that 0 be linearly independent). 

We have arrived at the notion we have been seeking. 

Definition. A basis of a vector space V is a linearly independent list that 
spans V. 

Thus, bases are shortest spanning lists. Of course, all the vectors in a linearly 
independent list vy, . . . , v„ are distinct, by Example 4.12(iv). 

Example 4.14. 

In Example 4. 10(ii), we saw that the standard basis E = ei, ... ,e n spans k " , 
where e t is the n -tuple having 1 in the / th coordinate and 0’s elsewhere. To 
see that E is linearly independent, note that ^" =1 = (a\, . a n ), so that 

Yl'i=\ a i e i = 0 if and only if each a,- = 0. Therefore, £ is a basis of k" . ◄ 

Proposition 4.15. Let X = V\ .... . v„ he a list in a vector space V over a field 
k. Then X is a basis if and only if each vector in V has a unique expression as a 
linear combination of vectors in X. 
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Proof. If a vector v = ff a i v i = 12 v i » then Ificii — bj)Vj = 0, and so 
independence gives o, = /;,■ for all /; that is, the expression is unique. 

Conversely, existence of an expression shows that the list of Vj spans. More- 
over, if 0 = If c-i Vj with not all c; = 0, then the vector 0 does not have a unique 
expression as a linear combination of the Vj . • 

Definition. If X = iq, . . . , v n is a basis of a vector space V and if v e V, 
then there are unique scalars a\,...,a n with v = 12"= \ a i v i • The n-tuple 
(a i, . . . , a n ) is called the coordinate list of a vector v e V relative to the ba- 
sis X. 

If £ = e \, . . . , e n is the standard basis of V = k" , then each vector v e V 
has a unique expression 

v = a iiq + a 2 V 2 H h a n v n , 

where ai e k for all i. The coordinate list of v e k" coincides with its usual 
coordinates, for 

v = (a i, . . . , a n ) = a\e\ H f a n e n . 

Since there is a first vector v\, a second vector V 2 , and so forth, the coefficients 
in this linear combination determine a unique n-tuple (a \ . a 2 , . . . , a,,)- Were a 
basis merely a subset of V and not a list, then there would be n ! coordinate lists 
for every vector. 

We are going to define the dimension of a vector space V to be the number 
of vectors in a basis. Two questions arise at once. 

(i) Does every vector space have a basis? 

(ii) Do all bases of a vector space have the same number of elements? 

The first question is easy to answer; the second needs some thought. 

Theorem 4.16. Every finite-dimensional vector space V has a basis. 

Proof. A finite spanning list X exists, since V is finite-dimensional. If it is 
linearly independent, it is a basis; if not, X can be shortened to a spanning sublist 
X ' , by Proposition 4.11. If X' is linearly independent, it is a basis; if not, X' 
can be shortened to a spanning sublist X" . Eventually, we arrive at a shortest 
spanning sublist, which is independent and hence it is a basis. • 

Remark. The definitions of spanning and linear independence can be extended 
to infinite-dimensional vector spaces (when dealing with infinite-dimensional 
vector spaces, one usually speaks of subspaces spanned by subsets rather than 
by lists). We can prove that these vector spaces also have bases. For example, it 
turns out that a basis of k[x ] is 1, x, x 2 , . . . , x”, .... ◄ 
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We are now going to prove invariance of dimension, one of the most impor- 
tant results about vector spaces. 

Lemma 4.17. Let u \ , . . . ,u n span a vector space V. If v\, , v m e V and 
m > n, then i>i , . . . , v m is a linearly dependent list. 

Proof. The proof is by induction on n > 1 . 

Base Step. If n = 1, then there are at least two vectors v i , v 2 , for m > n. and 
i>! = a | u i and i>2 = a 7 u 1 • If u i = 0, then v\ = 0 and the list of v’s is linearly 
dependent. Suppose u\ 0. We may assume that v\ 7^ 0, or we are done; 
hence, a \ f 0. Therefore, u\ = af l v\, and so vi, v 2 is linearly dependent (for 
i>2 — a 2 af l v\ = 0), and hence the larger list v\, . . . , v m is linearly dependent. 

Inductive Step. There are equations, for / = 1, . . . , in . 


Vi = anu 1 H 1- a in u n . 

We may assume that some an 0, otherwise v \ , . . . , v m e (u 2 , . . . , u n >, and 
the inductive hypothesis applies. Changing notation if necessary (that is, by re- 
ordering the n’s), we may assume that a \ \ f 0. For each i > 2, define 

v'j = vi - anaf'v 1 e {U 2 , ■■■, u n ) 

[the coefficient of u\ in v\ is 0 = an — {anaf^)a\\\. Since m — 1 > n — 1, the 
inductive hypothesis gives scalars hi_, . . . , h m , not all 0, with 

^2^2 + • • ■ + b m v' m = 0. 

Rewrite this equation using the definition of if : 

{~^2,bianafi)v\ + b 2 v 2 H h b m v m = 0. 

1 >2 

Not all the coefficients are 0, and so v \ , . . . , v m is linearly dependent. • 

Theorem 4.18 (Invariance of Dimension). If X = x\,...,x n and Y = 
yi, . . ., y m are bases of a vector space V, then m = n. 

Proof. If m f=- n, then either n < m or m < n. In the first case, yi, . . . , y m e 
(x \, . . . , x n ), because X spans V, and Lemma 4.17 gives Y linearly dependent, 
a contradiction. A similar contradiction arises if m < n, and so we must have 
m = n. • 


It is now permissible to make the following definition. 
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Definition. If V is a finite-dimensional vector space over a field k, then its 
dimension, denoted by dim(C), is the number of elements in a basis of V. 

Example 4.19. 


(i) Example 4.14 shows that k n has dimension n , for the standard basis has n 
elements, and this agrees with our intuition when k = WL Thus, the plane 
Ixlis two-dimensional! 

(ii) If V = {0}, then dim(V) = 0, for there are no elements in its basis 0. 
(This is the reason for defining 0 to be linearly independent.) 

(iii) Let X = {xi, . . . , x n ] be a finite set. Define 

k x = {functions / : X — > k}. 

Now k x is a vector space if we define addition f + /' to be 
f + f'-x h* fix) + fix) 

and scalar multiplication af, for a e k and / : X — > k, by 

af : x i-> af{x). 


It is easy to check that the set of n functions of the form f x , where x € X, 
defined by 


fxiy) = 


if y = x; 
if y t 4 x, 


forms a basis, and so dim(k A ) = n = |X|. 

This is not a new example: an u-tuple (a\, . . . , a n ) is really a function 
/ : {1, . . . , n) k with f(i ) = a\ for all i. Thus, the functions f x com- 
prise the standard basis. ◄ 


The following proof illustrates the intimate relation between linear algebra 
and systems of linear equations. 


Corollary 4.20. A homogeneous system of linear equations over afield k with 
more unknowns than equations has a nontrivial solution. 

Proof. An n-tuple (jj, . . . , s n ) is a solution of a system 

a\\x\ H 1- ai n x n = 0 


a m \x\ T * * * T o mn Xyi — 0 
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if a,- + • • ■ + aj„s n = 0 for all i. In other words, if Ci, . . . , C n are the columns 
of the m x n coefficient matrix A = [a (/ ], then the definition of matrix multipli- 
cation gives 

$1 Cl + • • • + s n C n = 0. 

Note that C/ e k m . Now k m can be spanned by m vectors (the standard basis, 
for example). Since n > m , by hypothesis. Lemma 4.17 shows that the list 
Ci, . . . , C n is linearly dependent; there are scalars pi , . . . , p n , not all zero, with 
p\C\ + • ■ ■ + p n C n = 0. Therefore, (p\. . . . , p n ) is a nontrivial solution of the 
system. • 

Definition. A list u\, , u m in a vector space V is a longest linearly inde- 
pendent list (or a maximal linearly independent list ) if there is no vector v e V 
such that mi,..., u m , v is linearly independent. 

Lemma 4.21. Let V be a finite-dimensional vector space. 

(i) Let v \ , . . . , v m be a linearly independent list in V, and let v € V. If 
v (i>i, . . . , v m ), then vi, . . . , v m , v is linearly independent. 

(ii) If a longest linearly independent list X = m, . . . , v n exists, then it is a 
basis ofV. 

Proof. 

(i) Let av + JT a i v i = 0. If a 0, then v = — a" 1 JT a i v i e (vt, . . . , v m ), a 
contradiction. Therefore, a = 0 and a i v i = 0- ® ut li near independence of 
v\, ... , v m implies each a, = 0, and so the longer list i>i, . . . , v m , v is linearly 
independent. 

(ii) If X is not abasis, then it does not span: there is w e V with w £ (v\, . . . , v n ). 
But the longer list X , w is linearly independent, by part (i), contradicting X being 
a longest independent list. • 

It is not obvious that longest linearly independent lists always exist; that they 
do exist follows from the next result, which is quite useful in its own right. 

Proposition 4.22. If Z = mi, ... , u m is a linearly independent list in an 
n-dimensional vector space V, then Z can be extended to a basis', that is, there 
are vectors v\, ... , v n - m so that u\, . . ., u m , V\ , . . . , v„- m is a basis of V. 

Proof If m > n. then Lemma 4.17 implies that Z is linearly dependent, a 
contradiction; therefore, m < n. If the linearly independent list Z does not span 
V, there is vi e V with vi f (Z), and the longer list Z,v\ = u\, ..., u m , i>i is 
linearly independent, by Lemma 4.21. If Z, i>i does not span V, there is vi e V 
with i >2 f (2, v i ) . This process eventually stops, for the length of these lists can 
never exceed n = dim(V). • 
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Corollary 4.23. //'dim ( V 7 ) = n, then any list ofn + 1 or more vectors is linearly 
dependent. 

Proof. Otherwise, such a list could be extended to a basis having too many 
elements. • 


Corollary 4.24. Let V be a vector space with dim(V) = n. 

(i) A list X ofn vectors which spans V must be linearly independent. 

(ii) Any linearly independent list Y ofn vectors must span V. 

Proof. 

(i) If the list X is linearly dependent, then it could be shortened to give a basis of 
V which is too small. 

(ii) If the list Y does not span V, then it could be lengthened to give a basis of V 
which is too large. • 


Corollary 4.25. Let U be a subspace of a vector space V of dimension n. 

(i) Then U is finite-dimensional and dim(f/) < dim(V ). 

(ii) Ifdim(U ) = dim(V), then U = V . 

Proof. 

(i) Take u\ e JJ . If U = (u\), then U is finite-dimensional. Otherwise, there 
is u 2 / (mi). By Lemma 4.21, mi, m 2 is linearly independent. If U = (mi, uf ), 
we are done. This process cannot be repeated n + 1 times, for then mi, ... , u n +\ 
would be a linearly independent list in U C V, contradicting Corollary 4.23. 

A basis of U is linearly independent, and so it can be extended to a basis of V. 

(ii) If dim(I/) = dim(V), then a basis of U is already a basis of V (otherwise it 
could be extended to a basis of V that would be too large). • 


Example 4.26. 

A projective plane of order n was defined, in Chapter 3, as a set X with |X| = 
n 2 + n + 1, and a family of subsets of X, called lines, each having n + 1 points, 
such that every two points determine a unique line. If q is a prime power, we 
constructed a projective plane of order q by adjoining a line at infinity to (F 9 ) 2 . 

We now give a second construction of a projective plane. Let k be a field and 
let W = k 2 . A line L in k 3 through the origin consists of all the scalar multiples 
of any one of its nonzero vectors: if v = (a, b, c) e L and v (0, 0, 0), then 


L = {rv = (ra, rb, rc ) : r e k). 
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Of course, if v' is another nonzero vector in L, then L = [nr : r e k). Thus, 
both v and v' span L if and only if both are nonzero and v' = tv for some 
nonzero t e k. Define a relation on the set of all nonzero vectors in kr by: 

v = (a, b, c) ~ 1/ = (a' , b' , c) if there exists t e k with v' = tv. 

Note that f / 0 lest tv = (0, 0, 0). It is easy to check that ~ is an equivalence 
relation on W — {(0, 0, 0)}, and we denote the equivalence class of v = ( a, b , c) 
by 

[d] = [a, b, c]. 

An equivalence class [u] is called a projective point, and the set of all projective 
points is called the projective plane over k, denoted by IMA:). If n is a plane in 
k 3 through the origin (that is, if it is a 2-dimensional subspace of /c 3 ), then we 
define the projective line [it] to consist of all the projective points [ij for which 

v e it. 

Corollary 4.27. Let k be afield. 

(i) Every two distinct projective points [u] and [1/] in P2 (k) lie in a unique 
projective line. 

(ii) Two distinct projective lines [zr] and [it'] in P2 (Ar) intersect in a unique 
projective point. 

Proof. 

(i) That [ v | and [1/ ] are projective points says that v and ?/ are nonzero vectors in 
k 3 ; that [u] 7^ [1/] says that v f v'\ that is, there is no scalar r / 0 with 1/ = tv, 
so that v, v' is a linearly independent list. Therefore, there is a unique plane 
7 i = (v, v') through the origin containing v and v' , and so [n] is a projective line 
containing [u] and [ v' ]. This projective line is unique, for if [ v \ . [v' \ e [n'], then 
v, v' e 7 x' , and so it c n' . Corollary 4.25(h) gives it = it', and so [it] = [it']. 

(ii) Consider it and n' in k 3 . By Exercise 4.18 on page 344, 

dim(jr + it') + dim(7T fl it') = dim(7r) + dim(7r0- 

Since it 7^ it', we have it C. it + it' \ hence, 2 = dim(7r) < dim(7T + it') < 3 = 
dim(k 3 ), and so dim(7T + it') = 3. Hence, dim(7T D it') = 2 + 2 — 3= 1, so that 
[it n it'] = [7 t] n [ tt' ] is a projective point. The point of intersection is unique, 
lest we contradict part (i). • 

Proposition 4.28. If q = p n for some prime p, then there exists a projective 
plane of order q. 
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Proof. Let X = P2 (k), where k = ¥ q . Now \k 3 | = q , and so there are q 3 — 1 
nonzero vectors in k 3 . If v e /c ! is nonzero, then |[u]| = q — 1, for there are 
exactly q — 1 nonzero scalars in k. Thus, |X| = ( q 3 — 1 )/(q — 1) = q 2 +q + 1. 
Finally, a plane it through the origin in k 3 has q 2 — 1 nonzero vectors, and so 
|[;r]| = ( q 2 — 1 )/(q — 1) = q + 1. By Corollary 4.27, X is a projective plane of 
order q. • 

Here is the usual definition of a projective plane. 

Definition. Let X be a set and let £ be a family of subsets of X, called lines. 
Then (X. C) is a projective plane if 

(i) Every two lines intersect in a unique point. 

(ii) Every two points determine a unique line. 

(iii) There exist 4 points in X no three of which are collinear. 

(iv) There are 4 lines in C no three of which contain the same point. 

In the special case when X is finite, this definition is equivalent to the defi- 
nition given in Chapter 3. 

Define the dual of a statement S about (X, C) to be the statement obtained 
from S in which the terms point and line are interchanged as are the terms con- 
taining and contained in. The dual of each axiom in the definition of projective 
plane is another axiom. We conclude that any theorem about projective planes 
yields a dual theorem whose proof is obtained from the original proof by dual- 
izing each of the statements in its proof. 

One can also see duality by comparing the construction of IP 2 (^) with the 
construction of k 2 U co in Chapter 3, where 

co = {coi \ i is a line through the origin} 

is the line at infinity. In more detail, let l = {r(a. b) : r e k) be a line in k 2 
through the origin, where (a, b) (0, 0). We may denote l by [a, b | and 


= M[ a .b] ■ 


Note that this notation is consistent with [o,fi,c] e P2 (k); that is, if i = 
{r(a ' , b 1 ) : r e k }, then there is a nonzero t e k with {a\ b ') = t (a, b). Define a 
function <p : Fiik) k 2 U co by 


(p([a, b, c ]) 


| (oc 1 , be ^ if c yk 0; 
[ «[(!,/>] if c = 0. 


It is straightforward to check that cp is a (well-defined) bijection. 
A proof of the following lemma is not difficult. 



Vector Spaces 341 


Lemma. A subset tt C is a plane through the origin if and only if there 
are p, q,r £ k, not all zero, with it = {(a, b, c ) £ If : pa + qb + rc = 0}. 
Moreover, if it' = {(a, b, c ) £ if : p'a + q'b + r'c = 0}, f/ien n = tt' if and 
only if there is a nonzero t ek with ( p' , q' , r') = t(p, q, r). 

Projective points almost have coordinates: if v = ( a,b,c ), then we call 
[a, b, c] the homogeneous coordinates of the projective point [ij (these are de- 
fined only up to nonzero scalar multiple). In light of the Lemma, projective lines, 
too, almost have coordinates: if tt = {( a,b,c ) £ k 3 : pa+qb+rc = 0}, then we 
call [ p. q, r] the homogeneous coordinates of the projective line [tt] (these are 
defined only up to nonzero scalar multiple). The bijection cp: P 2 (k) -* ifu o> 
preserves lines, and the duality in projective planes can be viewed as replacing a 
projective point with homogeneous coordinates [a, b, c] with the projective line 
having these same homogeneous coordinates. A 

We are now going to apply linear algebra to fields. 

Proposition 4.29 (= Proposition 3.119). If E is a finite field, then \E\ = p n 
for some prime p and some n > 1. 

Proof By Lemma 3.120(i), the prime field of E is isomorphic to F p for some 
prime p. Since E is finite, it is finite-dimensional; say, dim(£) = n. If v \, . . . , v n 
is a basis, then there are exactly p n vectors a\ vi +• • ■+a n v n £ E, where a-, £ F ;) 
for all i . • 

Definition. If k is a subfield of a field K, then we usually say that K is an 
extension of k . We abbreviate this by writing “K /k is an extension.” 4 

If K/k is an extension, then K may be regarded as a vector space over k, 
as in Example 4. 1 (iii). We say that K is a finite extension of k if K is a finite- 
dimensional vector space over k. The dimension of K, denoted by [K : k ], is 
called the degree of K/k. 

Here is the reason [K : k\ is called the degree. 

Proposition 4.30. Let E /k be an extension, let z £ E be a root of an irreducible 
polynomial p(x) £ k[x\ and let k(z) be the smallest subfield of E containing k 
and z. Then 

[£(z) : k] = dim^lz)) = deg(/z). 

Proof. Proposition 3. 1 16(iv) says that each element in k(z) has a unique ex- 
pression of the form 

bo + b\z + ■ ■ • + b n —\z n ' , 

4 We pronounce K/k as “S' over k”\ there should be no confusing this notation with that 
of a quotient ring, for K is a fi eld and hence it has no proper nonzero ideals. 
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where bj e k and n = deg (p). In the language of linear algebra (which was not 
available in Chapter 3), the list 1, z, z 2 , . . . , z n ~ i is a basis of k{z)/k. • 

The following formula is quite useful, especially when one is proving a the- 
orem by induction on degrees. 

Theorem 4.31. Let k C K C E be fields, with K a finite extension ofk and E 
a finite extension of K. Then E is a finite extension ofk, and 

[E : k] = [E : K][K : k] 

Proof. If A = a \, . . . , a n is a basis of K over k and if B = b \, . . . , b m is a 
basis of E over K, then it suffices to prove that a list X of all a\bj is a basis of 
E over k. 

To see that X spans E, take e e E. Since B is a basis of E over K, there 
are scalars Xj e K with e = ff j Xjbj. Since A is a basis of K over k, there are 
scalars fiji e k with Xj = JT Therefore, e = - iij\a\bj , and X spans 

E over k. 

To prove that X is linearly independent over k, assume that there are scalars 
jiji € k with • [ijjajbj = 0. If we define Xj = JT fijiUi, then Xj e if and 
• Ayfty = 0. Since is linearly independent over /f. it follows that 

0 = Xj = ^ BjiUi 
i 

for all j . Since A is linearly independent over k. it follows that /xy, = 0 for all j 
and i, as desired. • 

Definition. Assume that K/k is an extension and that z € K. We call z alge- 
braic over k if there is some nonzero polynomial f(x) e k[x \ having z as a root; 
otherwise, z is called transcendental over k. 

When one says that a real number is transcendental, one usually means that 
it is transcendental over Q. For example, F. Lindemann (1852-1939) proved, 
in 1882, that it is transcendental, so that [Q(tt) : Q] is infinite (see A. Baker, 
Transcendental Number Theory, p. 5). Using this fact, we can see that M, viewed 
as a vector space over Q, is infinite-dimensional. (For a proof of the irrationality 
of 7r, a more modest result, we refer the reader to Niven and Zuckerman, An 
Introduction to the Theory of Numbers.) 

Proposition 4.32. If K/k is a finite extension, then every z £ K is algebraic 
over k. 
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Proof. If [ K : k\ = n. then the list 1, z, z 2 , . ■ . , z" has length n + 1, by Corol- 
lary 4.23. Hence, there are a, e k, not all zero, with a iZ l = 0- If we 
define f(x ) = Yl'!=o «,x\ then f(x ) is not the zero polynomial and f(z) = 0. 
Therefore, z is algebraic over k. • 


Exercises 

* 4.1 (i) If / : k —*■ k is a function, where k is a fi eld, and if a G k, defi ne a new 

function af : k — > A: by a o af{a). Prove that with this defi nition of 
scalar multiplication, the ring J-{k) of all functions on k is a vector space 
over k. 

(ii) If V{k) C kF(k) denotes the family of all polynomial functions a o 
a n a n + • • • + a\ a + prove that V(k) is a subspace of T(k). 

4.2 Prove that the only subspaces of a vector space V are {0} and V itself if and only 
ifdim(V)< 1. 

4.3 Prove, in the presence of all the other axioms in the defi nition of vector space, that 
the commutative law for vector addition is redundant; that is, if V satisfi es all the 
other axioms, then u + v = v + u for all u, v G V . 

4.4 Is L a subspace of Mat„ (k) if L C Mat,, (k) is the subset consisting of all the n x n 
Latin squares? 

4.5 If V is a vector space over F 2 and if V[ f Vo are nonzero vectors in V, prove that 
Pi , V 2 is linearly independent. Is this true for vector spaces over any other fi eld? 

4.6 Prove that the columns of an m x n matrix A over a fi eld k are linearly dependent 
in k m if and only if the homogeneous system Ax = 0 has a nontrivial solution. 

* 4.7 (i) Prove that the list of polynomials 1, x, x 2 , x 3 , . . . , x 100 is a linearly in- 

dependent list in k[x\ where k is a fi eld. 

(ii) Defi ne % = (1, x, x 2 , . . . , x n ). Prove that 1, x, x 2 , . . . , x" is a basis of 
V„, and conclude that dim(V„) = n + 1. 

4.8 It is shown in analytic geometry that if l \ and I 2 are lines with slopes m \ and m 2 , 
respectively, then l\ and I 2 are perpendicular if and only if m 1 W 2 = — 1. If 

ii = fa'll; + Uj : a e M}, 

for i = 1, 2, prove that W 1 W 2 = —1 if and only if the dot product i>i • V 2 = 0. 
(Since both lines have slopes, neither of them is vertical.) 

4.9 (i) In calculus, a line in space passing through a point u is defi ned as 

{u + otw : a G R} CM 3 , 

where w is a fi xed nonzero vector. Show that every line through u is a 
coset of a one-dimensional subspace of R 3 . 

(ii) In calculus, a plane in space passing through a point u is defi ned as the 
subset 

{v G R 3 : (i — u) ■ n = 0} C R 3 , 
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where n / 0 is a fi xed normal vector and ( v — u) ■ n is a dot product. 
Prove that a plane through u is a coset of a two-dimensional subspace 
of M 3 . 

4.10 (i) Prove that dim(Mat mx „(fc)) = mn. 

(ii) Determine dim(S), where S is the subspace of Mat,, (A:) consisting of all 
the symmetric matrices. 

4.11 Let A G Mat,, (A:)- If the characteristic of k is not 2, then A is called skew- 
symmetric if A 1 = —A, where A 7 is the transpose of A. In case k has char- 
acteristic 2, then A is skew-symmetric if it is symmetric and if all its diagonal 
entries are 0. 

(i) Prove that the subset K of Mat,, (A'), consisting of all the skew-symmetric 
matrices, is a subspace of Mat,, (A'). 

(ii) Determine dim( K ). 

4.12 If p is a prime with p = 1 mod 4, prove that there is a nonzero vector v G (F /; ) 2 
with (v, v ) = 0, where (v, v ) is the usual inner product of v with itself [see Exam- 
ple 4.4(i)]. 

*4.13 Let A' be a fi eld, and let k’ have the usual inner product. Prove that if v = a\e\ + 

• • • + a„e„, then a; = ( v , e, ) for all i. 

*4.14 If f(x) = co + c\x + • • • + c m x m G k\x | and if A G Mat„(A'), deli ne 
f (A) = co/ + Ci A + • • • + c,„A'" G Mat,, {k). 

Prove that there is some nonzero f(x) G k[x ] with /(A) = 0. 

4.15 (i) If U is a subspace of a vector space V over a fi eld k, defi ne a scalar 

multiplication on the cosets in the quotient group V/U by 

a(v + U) = av + U, 

where a G k and v G V . Prove that this is a well-defi ned function that 
makes V/U into a vector space over k (V/U is called a quotient space). 

(ii) Prove that the natural map n : V — > V/U . defi ned by jt(v) = v + U, is 
a linear transformation. 

*4.16 If V is a fi nite-dimensional vector space and U is a subspace, prove that 

dim([7) + dim (V/U) = dim(V). 

*4.17 Let Ax = b be a linear system of equations, and let j be a solution. If U is the 
solution space of the homogeneous linear system Ax = 0, prove that every solution 
of Ax = b has a unique expression of the form .? + it for u G U . Conclude that the 
solution set of Ajc = b is the coset s + U. 

*4.18 If U and W are subspaces of a vector space V , defi ne 

U + W = {u + w : u G U and w G W}. 

(i) Prove that U + W is a subspace of V . 

(ii) If U and U' are subspaces of a fi nite-dimensional vector space V , prove 
that 

dim (U) + dim(U') = dim(t/ n U') + dim(U + U 1 ). 
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(iii) A subspace U C V has a complement S if S C V is a subspace such that 
U + S = V and U f~l S = { 0 }; one says that U is a direct summand of V 
if U has a complement. If V is fi nite-dimensional, prove that every sub- 
space U of V is a direct summand. (This is true for infi nite-dimensional 
vector spaces as well, but a proof requires Zorn’s lemma.) 

* 4.19 If U and W are vector spaces over a fi eld k, defi ne their direct sum to be the set of 
all ordered pairs, 

U © W = \(u, w) : u € U and w e W], 

with addition ( u , w ) + (u' , w') = (u + u ' , w + w') and scalar multiplication 
a (u, w) = ( au , aw). 

(i) Prove that U 0 W is a vector space. 

(ii) If U and W are fi nite-dimensional vector spaces over a fi eld k, prove that 

dim(f/ © W) = dim(f/) + dim(lT). 

* 4.20 Assume that V is an n -dimensional vector space over a fi eld k, and that V has a 
nondegenerate inner product. If W is an r -dimensional subspace of V, prove that 
V = W © W^. (See Example 4 . 5 .) Conclude that diindV^) = n — r. 

4.21 Here is a theorem of Pappus holding in k 2 , where k is a fi eld. Let l and rn be 
distinct lines, let Aj , A2, A3 be distinct points on l, and let B \ , B2, S3 be distinct 
points on m. Defi ne Q to be A2S3 fl A3S2, C2 to be A\ S3 fl A3S1, and C3 to be 
A1S2 (~l A2S1. Then Ci, C2, C3 are collinear. 

State the dual of the theorem of Pappus. 


Gaussian Elimination 

The following homogeneous system of equations over a field k can be solved at 
once: 

•^1 H - ^l,m+l-* 7 w+l + • • • + b\ n x n — 0 
H - ^2,m+l^m+l H - ' ' ' H - ^2 n%n = 0 

%m “ 1 “ H - ' ’ ’ H - b mn X n — 0 . 

Replacing . . . , x n by constants c m +i, . . . , c n e k, we have 

n 

Xi = — bjjCj for all / < m, 

j=m + 1 

so that an arbitrary solution has the form 

n n 

(— b i / cj , . . . , — b m jCj, c m +i, . . . , c^j. 
j=m+ 1 j=m+ 1 
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The coefficient matrix B of this system, 

1 0 ... 0 . . . b\ n 

0 1 ... 0 b2,m+l ■ ■ ■ b2n 

B= . 

0 0 ... 1 b m m + 1 • • ■ b mn 

is an example of a matrix in echelon form. 

Definition. Ann? x n matrix B is in row reduced echelon form 5 if 

(i) each row of all zeros, if any, lies below every nonzero row; 

(ii) the leading entry of each nonzero row (its first nonzero entry) is 1; 

(iii) every other entry in a leading column (a column containing a leading en- 
try) is 0; 

(iv) the leading columns are COL(ti), . . ., COL(f r ), where t\ < t 2 < ••• < t r 
and r < m . 

We say that B is in echelon form if the leading columns are COL(l), . . ., COL(r); 
that is, t, = i for all i <r. 

Definition. There are three elementary row operations A — % A' changing a 
matrix A into a matrix A': 

Type I : o adds a scalar multiple of one row of A to another row; that is, 
o replaces ROW(i) by ROW(i) + cROW(y'), where c e k and j i; 

Type II : o multiplies one row of A by a nonzero c € k\ that is, o replaces 
ROW (/ ) by cROW(i), where cel and c 0. 

Type III : o interchanges two rows of A. 

There are analogous elementary column operations on a matrix. 

An interchange (Type III) can be accomplished by operations of types I and 
II (in spite of this redundancy, interchanges are still regarded as elementary op- 
erations because they arise frequently). We indicate this schematically. 

a b a — c b — d a — c b — d —c —d c d 
c d c d a b a b a b 

5 The word echelon means ‘Wing,” for the staggering of the leading entries suggests the 
shape of a bird’s wing. 
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Proposition 4.33. If A — > A' is an elementary row operation, then A and A ’ 
have the same row space: Row(A) = Row(A'). 

Proof. Suppose that A — > A' is an elementary operation of Type I. The row 
space of A is Row(A) = (ai, . . . , a m ), where a ; is the ith row of A; the row 
space Row{A') is spanned by a,- + cuj and a\ a m , where c e k 
and j 7 - i. It is obvious that Row (A') C Row(A). For the reverse inclusion, 
observe that o',- = (a,- + caj ) — coij e Row{A'). 

If A — > A' is an elementary operation of Type II, then Row (A') is spanned 
by cctj and a\ a m , where c / 0 . It is obvious that Row(A') C 
Row(A). For the reverse inclusion, observe that a,- = c _1 (ca,) e Row(A'). 

There is no need to consider elementary operations of Type III, for we have 
already seen that any such can be obtained as a sequence of elementary opera- 
tions of the other two types. • 

Definition. If A is an m x n matrix over a field k with row space Row(A), then 

rank(A) = dim( 7 ?ou;(A)). 


Corollary 4.34. If A -> A’ is an elementary row operation, then 

rank(A) = rank(A'). 

Proof Even more is true; the row spaces of A and of A' are equal, and so they 
certainly have the same dimension. • 

We remark that if A — > A’ is an elementary row operation, then A and A ' 
may not have the same column space. For example, consider [ J q ] — ^ [ o 8 ] • 

It is not obvious, but it is true, that both the row space and the column space 
of a matrix have the same dimension (see Corollary 4 . 83 ). 

We are going to show that if A — »■ A' is an elementary row operation, then 
the homogeneous systems Ax = 0 and A'x = 0 have the same solution space. 
To prove this, we introduce elementary matrices. 

Definition. Let o be an elementary row operation, so that A' can be denoted by 
A' = o(A). An elementary matrix is an m x m matrix E of the form E = o(I), 
where I is the m x m identity matrix. If o is of Type I, II, or III, then we say that 
o(I) is an elementary matrix of Type I, II, or III. 

Here are the 2 x 2 elementary matrices. 


'1 o' 


1 c 


"c O' 


'1 O' 


'0 

1 " 

c 1 

’ 

0 1 


0 1 

’ 

0 c 

’ 

1 

0 
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Applying elementary column operations to the identity matrix yields the 
same family of elementary matrices. 

The next lemma shows that the effect of an elementary row operation on a 
matrix A is the same as multiplying A on the left by an elementary matrix, while 
the effect of an elementary column operation on A is the same as multiplying A 
on the right by an elementary matrix. 


Lemma 4.35. If A is an m x n matrix and if A — > A' is an elementary row 
operation, then o(A) = o(I)A\ if A — A' is an elementary column operation, 
then o(A) = Ao(I). 


Proof. We will 

merely illustrate the result, leaving the proof 


1 

0 

0 


a 

b 

c 


a b 

Type I 

0 

1 

0 


d 

e 

f 

= 

d e 


u 

0 

1 


_g 

h 

i 


ua + g ub T 


”1 

0 

o' 


a 

b 

c 


a b c 

Type II 

0 

u 

0 


d 

e 

f 

= 

ud ue uf 


0 

0 

1 


_g 

h 

i 


g h i 


the reader. 


/ 

uc + i 


As before, the result is true for elementary row operations of Type III. 
We illustrate an elementary column operation. 


Type I 


a b c 


1 0 0" 


a + cu b c 

d e f 


0 1 0 

= 

d + df e f 

_g h i _ 


u 0 1 


g + iu h i 


Recall that an n x n matrix A is nonsingular if there exists an m x m matrix 
B such that AB = I and BA = I; one calls B the inverse of A and denotes it 
by A" 1 . 


Proposition 4.36. Every elemen tary matrix E is a nonsingular matrix. In fact, 
£ _1 is an elementary matrix of the same type as E. 

Proof If o is an elementary row operation of Type I, then o replaces ROW (7 ) by 
ROW ( i ) + cROW (j ) . Define o' to be the elementary row operation which replaces 
ROW(i) by ROW (/ ) — cROW(y). The inverse of the elementary matrix E = o(7) 
is o'(£), for o'(o(/)) = /, so that o'(o(/)) = o'{E ) = E'E, and o(o'(/)) = /, 
so that o(o'(/)) = o(E') = EE'. Note that E' is an elementary matrix of Type I. 

If o is an elementary row operation of Type II, then o replaces ROW(i) by 
cROW (7 ). Define o' to be the elementary row operation which replaces ROW(i) 
by A 1 ROW it ) (this is why we insist that c j=- 0). The inverse of the elementary 
matrix E = o(7) is o'(£), for o'(o(/)) = /, so that o'(o(/)) = o'(£) = E'E, 
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and o(o'(I )) = I, so that o(o'(I )) = o(E') = EE'. Note that E' is an elemen- 
tary matrix of Type II. 

An elementary matrix E of Type III is equal to its own inverse: EE = I. • 
The next proposition is the key to Gaussian elimination. 

Proposition 4.37. If A — > A' is an elementary row operation, then the linear 
systems Ax = 0 and A'x = 0 have the same solution space. 

Proof. Let S and S' be the solution spaces of Ax = 0 and A'x = 0, respectively. 
If A' = o(A), then A' = E A, by Lemma 4.35, where E is the elementary matrix 
o(I). If v € S, then Av = 0; hence, 0 = EAv = A'v, and so v € S'. The reverse 
inclusion follows from the equation A = E~ l A', for E~ 1 is also an elementary 
matrix. • 

Corollary 4.38. If A and B are m x n matrices over a field k, and if there is a 
sequence of elementary row operations 

A = Ao — > A i — > ■ ■ ■ — > Ap = B , 

then there is a nonsingular matrix P with B = PA. If there is a sequence of 
elementary column operations 


B = Bq — »■ B\ — »■ • • • — B q = C, 
then there is a nonsingular matrix Q with C = BQ. 

Proof. There are elementary matrices Ej with Aj = E, /\,_ | for all i > 1 . 
Therefore, B = E p ■ ■ ■ EiE\A. Define P = E p ■ ■ ■ EiE\, so that B = PA. 
Now P is nonsingular, for the product of nonsingular matrices is nonsingular 
[{E p ■ ■ ■ E^E i) _1 = Ef l Ef l ■ ■ ■ Ef 1 ]. The second statement is proved simi- 
larly. • 

Definition. If a £ S n is a permutation, then an n x n matrix Q a is called a 
permutation matrix if it arises from the n x n identity matrix I by permuting its 
columns by o . 

If r e S n is a transposition, then Q r interchanges two columns, and so it 
is an elementary matrix of Type III. Since every permutation o is a product of 
transpositions (Proposition 2.35), Q a is a product of elementary matrices. 

If Ax = 0 is a homogeneous system, then the columns of A correspond 
to the labels on the variables: COL(i) corresponds to x Thus, AQ a , which is 
the matrix whose columns have been permuted by cr, corresponds to the “same” 
homogeneous system (AQ a )y = 0 with variables y, = x„oy 
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Definition. A matrix A is Gaussian equivalent to a matrix B if there is a 
sequence of elementary row operations 

A = Aq — > A [ — > • • • — > A p = B . 

It is easy to show that Gaussian equivalence is an equivalence relation on the 
set of all m x n matrices. 

Theorem 4.39 (Gaussian Elimination). 

(i) Every m x n matrix A over afield k is Gaussian equivalent to a matrix B 
in row reduced echelon form. 

(ii) The matrix B in part (i) is uniquely determined by A. 

(iii) There is a permutation matrix Q a with P AQ a in echelon form. 

Proof. 

(i) The proof is by induction on n. the number of columns of A. Let n = 1. If 
A = 0, we are done. If A 0, then aj\ f 0 for some j . Multiply ROW ( / ) by 
aj x , and then interchange ROW(y) with ROW(l), so that the new matrix A' = 
[a' p j] has a' n = 1. For each p > 1, replace a' ^ by a' x — a' pi a' n = 0. We have 
arrived at an m x 1 row reduced echelon matrix, for the entry in its first row is 1 , 
while all the other rows are 0. 

For the inductive step, let A be an m x (n +1) matrix. If the first column 
of A is 0, then put the matrix comprised of the last n columns into row reduced 
echelon form, by induction. The resulting matrix is itself in row reduced echelon 
form. If the first column of A is not 0, put its first column in row reduced echelon 
form (as in the base step), so that the new matrix A' = [ q 'f ] , where M is an 
( m — 1) x n matrix. Your first guess is to apply the inductive hypothesis to the 
matrix comprised of the last n columns, as in the first case. This may not be 
convenient, for one of the elementary row operations may have added a multiple 
of ROW(l) to another row, thereby changing the first column. Instead, we use 
the inductive hypothesis to replace M by I). where D is a row reduced echelon 
matrix Gaussian equivalent to M. Thus, A ’ is Gaussian equivalent to N = \ ^ ) n ] . 
Let the leading columns of N be COL(f 2 ), . . COL(/y), where 2 < f 2 < • • • < /> 
(the first column of D is the second column of N). It is possible that the entry 
yi t 2 / 0. If so, replace ROW(l) of N by ROW(l)— y iifl ROW(2) (the first row of 
D is the second row of N ). Since the leading entry of COL(G) has only 0 entries 
to its left, this operation does not change any columns of N to the left of COL(G)- 
Thus, the leading entry of COL(G) is now the only nonzero entry in its column, 
while COL(l) has not been changed. Next, make yi >?3 = 0 in the same way 
(so that COL(l) and COL (/a) are unchanged), and continue until all yi Jj = 0. 
We have arrived at a row reduced echelon matrix with leading columns COL(l), 
COL(f 2 ), . . COL(fi-). 
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(ii) Suppose that B is a row reduced echelon matrix Gaussian equivalent to A. 
Let the nonzero rows of B be p\, .... p r , let the leading columns of B be 
COL(fi), . . COL(Y r ), and let /?,- = e tj + u , where m, e ( e v : v > f ; ) (as usual, 
ei, ... ,e n is the standard basis of k n ). We claim that COL(fi), . . COL(fi-) are 
precisely those columns of B in which the leading entry of a nonzero vector in 
(Pi , . . . , fi r ) = Row(B) can occur. It will then follow that the leading columns 
are determined by Row(B). Clearly, COL (7,) contains a leading entry (namely, 
that of Pi). On the other hand, if y is a nonzero vector in (Pi, ... , p r ) , then 
y = c\ Pi + • • • + c r p r . If we picture each p; as the ith row of B . then multiply 
ROW(i) by a and add: y is the sum, and its / th coordinate is just the sum of the 
entries in the j th column. Thus, for each i, the r, coordinate of y is a, for there 
is no other nonzero entry in COL(f,). Since y f 0, some a f= 0; we claim that 
the first such, c tiQ , is its leading coefficient. Now all c,- = 0 for i < io, and so 
Y = c it 0 e i, 0 + where &> = J2i>i 0 c i u i e <G : v > t io ). Hence, the leading 
coefficient of y lies in GOL(f, 0 ). 

If B' is another row reduced echelon matrix Gaussian equivalent to A, then 
Row(B') = Row(B), by Proposition 4.33. Since we have just proved that the 
row space determines the leading columns, it follows that the leading columns 

of B' and of B are the same. Let the nonzero rows of B' be /ij p' r . Now 

P^ e Row(B) = Row(B'), so there are c v £ k with p ■ = c v p v . But we saw 
in the preceding paragraph that for each v, the fi,th coordinate of p- is c v . Hence, 
d = 1 and all other c v = 0, so that /?• = Pi for all i. Therefore, B' = B. 

(iii) Choose a to be a permutation with cr(fi) = i for i = 1, . . . , r. • 

Corollary 4.40. Let A be an m x n matrix over afield k. If B is the row reduced 
echelon form of A, then a basis of Row(A) consists of the nonzero rows of B. 

Proof. Now7?ow(A) = Row(B), by Proposition 4.33, and so it is spanned 
by the rows of B. But it is obvious that the nonzero rows of B are linearly 
independent, and so they form a basis. • 

Definition. If B is an m x n matrix in row reduced echelon form with leading 
columns COL(fi), . . ., COL(fi-), then x tl , . . ., x tr are called fixed variables (or 
lead variables ) and the other variables are called free variables. 

Recall that Gaussian elimination is the method of solving problems in linear 
algebra by replacing a matrix A by the row reduced echelon matrix U which is 
Gaussian equivalent to it. For example, if one replaces the coefficient matrix A 
of the system Ax = 0, then Proposition 4.37 says that Ux = 0 has the same 
solution space as Ax = 0. Now Ux = 0 is easily solved, as on page 345. 

The next theorem looks more complicated than it really is, and we shall 
simplify its statement after the proof. 
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Theorem 4.41. Let Ax = 0 be a system of linear equations, where A is an 
m x n matrix over a field k, and let B be the ( unique ) row reduced echelon 
form Gaussian equivalent to A. Let the fixed variables be x tl , , x tr , let the 
free variables be x pi , ... , x Pnr , and let the nonzero rows of B be /?,- = e tj + Uj, 
where Uj = bj pf e pt . Then Sol (A), the solution space of Ax = 0, consists of 
all vectors of the form 

c t e pt. — jpi c pi e ti ■ 
t i 

Proof. By Proposition 4.37, an ra-tuple s = (ci, . . . , c„) is a solution of Ax = 0 
if and only if it is a solution of Bx = 0. Now the ith entry of the n x 1 matrix 
Bs is c ti + bj Pf c pt , where pi ranges over all the free variables. Therefore, 
s is a solution if and only if c tj = — ff, bj pt c pt for all i. • 

The notation describing the solutions of Bx = 0 would be simpler if B were 
in echelon form; that is, if the leading columns of B were its first r columns. 
By Theorem 4.39, there is a permutation o such that BQ a is in echelon form. 
But permuting the columns merely relabels the variables, and so it is no loss in 
generality to consider the notationally simpler case in which the first r variables 
are the fixed variables and the last n — r variables are the free variables. In this 
case, the solutions are 

n n 

buCj, ..., -Y. brlCl,C r +l,...,C n \ 
t=r + 1 t=r + 1 

The next result is often called the rank-nullity theorem. 

Theorem 4.42. Let A be an m x n matrix over afield k. If Sol (A) is the solution 
space of the homogeneous linear system Ax = 0, then 

dim(So/(A)) = n — r, 


where r = rank( A) . 

Proof. Let us assume that the variables have been relabeled so that the fixed 
variables precede all the free variables. For each l with 1 <£</? — r, define si 
to be the solution (ci , . . . , c„) with c Pl = 1 and c Pv = 0 for all v j - l. Thus, 

si = ( blr+l . -b2,r+l ■■■, ~b r ,r+ 1, 1, 0, . . . , 0) 

S2 = (—bij- + 2, ~b2,r+2 ■ ■ ■, —b r ,r+ 2, 0, 1, . . . , 0) 

S,i—r = ( b 1,;;, b2,n ■ ■ • > b rn , 0, 0, ... , 1). 
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These n — r vectors are linearly independent (look at their last n — r coordinates), 
while Theorem 4.41 shows that they span So/(A): 

n n n 

(-T, buCj, ■ ■ ■ , — b r £Ci, C r+ l, . . . , Cn'j = ^2 c t s t • • 

t=r + 1 £=r + 1 i=r + 1 


The dimension n — r of the system Ax = 0 is often called the number of 
degrees of freedom in the general solution. 


Example 4.43. 

Consider the matrix 


A = 


0 

-2 

3 


0 1 
-4 1 

6 -1 


1 0 

0 -3 

1 5 


Find rank(A), find a basis of its row space, and find a basis of the solution space 
of the homogeneous system Ax = 0. 

The matrix A is Gaussian equivalent to 


B = 


12 0 0 
0 0 10 
0 0 0 1 


1 

-1 

1 


Thus, rank(A) = 3 and a basis of the row space is (1, 2, 0, 0, 1), (0, 0, 1,0, — 1), 
(0, 0, 0, 1, 1). Note that the rows of A are also linearly independent, for they 
span a 3-dimensional space (see Corollary 4.24). The fixed variables are x\, X 3 , 
and X4, while the free variables are X2 and X 5 . The solution space has dimension 
5 — 3 = 2. The system fix = 0 is 


xi + 2x2 + X5 = 0 
X3 - X5 = 0 
X4 + X5 = 0. 


The general solution is (—2c — d, c, d, — d, d). A 

Theorem 4.42, the rank-nullity theorem, should be compared with Exer- 
cise 4.20 on page 345. If W = Sol(A ), then dim(lT) = n — dimlSo/lA))- 1 . But 
we reported, in Example 4.5, that Sol(A )- 1 = Row(A), and so dim(5o/(A)- L ) = 
dim(fiou;(A)) = rank(A). 

The exercises will show how to use Gaussian elimination to solve other prob- 
lems in linear algebra. 
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Exercises 


4.22 (i) Prove that a list v\, ... ,v m in a vector space V is linearly independent if 

and only if it spans an m -dimensional subspace of V. 

(ii) Determine whether the list v\ = (1, 1, —1, 2), i >2 = (2, 2, —3, 1), 1)3 = 
(—1, —1,0, —5) in k 4 is linearly independent. 

4.23 Do the vectors v\ = (1, 4, 3), t >2 = (—1, —2, 0), 1)3 = (2, 2, 3) span k 3 ? 

4.24 Let k be a fi eld, and let A be an n x m matrix over k. Call an (inhomogeneous) 
linear system Ax = fi, where fi e k m , consistent if there is v € k" with Av = fi. 
Prove that Ax = fi is consistent if and only if fi lies in the column space of A. 
(Recall Exercise 4.17 on page 344: the solution set of a consistent inhomogeneous 
system Ax = b is a coset of the solution space of Ax = 0.) 

4.25 If A is an n x n nonsingular matrix, prove that any system Ax = b has a unique 
solution, namely, x = A~ l b. 

4.26 Let u\ , . . . , a„ t be the columns of an n x m matrix A over a fi eld k, and let fi e K" . 

(i) Prove that fi e (ai, . . . , a m ) if and only if the inhomogeneous system 
A t x = fi has a solution. 

(ii) Define the augmented matrix [A^\fi T ] to be the n x (m + 1) matrix 
whose fi rst m columns are A 1 and whose last column is fi 1 . Prove that fi 
lies in the column space of A 1 if and only if rank([A r |/3 r J) = rank(A). 

(iii) Does fi = (0, —3, 5) lie in the subspace spanned by a\ = (0, —2, 3), 
« 2 = (0, -4,6), a 3 = (1, 1,-1)? 

4.27 (i) Prove that an n x n matrix A over a fi eld k is nonsingular if and only if it 

is Gaussian equivalent to the identity I . 


(ii) Find the inverse of 



3 1 

1 0 
0 1 


4.2 Euclidean Constructions 

There are myths in several ancient civilizations in which the gods demand pre- 
cise solutions of mathematical problems in return for granting relief from catas- 
trophes. We quote from van der Waerden, Geometry and Algebra in Ancient 
Civilizations. 

In the dialogue ‘Platonikos’ of Eratosthenes, a story was told about 
the problem of doubling the cube. According to this story, as Theon 
of Smyrna recounts it in his book ‘Exposition of mathematical things 
useful for the reading of Plato’ , the Delians asked for an oracle in 
order to be liberated from a plague. The god (Apollo) answered 
through the oracle that they had to construct an altar twice as large 
as the existing one without changing its shape. The Delians sent a 
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delegation to Plato, who referred them to the mathematicians Eu- 
doxos and Helikon of Kyzikos. 

The altar was cubical in shape, and so the problem involves constructing \/2. 
The gods were cruel, for although there is a geometric construction of s/2 (it 
is the length of the diagonal of a square with sides of length 1), we are going to 
prove that it is impossible to construct s/2 by the methods of euclidean geometry 
- that is, by using only straightedge and compass. (Actually, the gods were not 
so cruel, for the Greeks did use other methods. Thus, Menaechmus constructed 
s/2 with the intersection of the parabolas y 2 = 2x and x 2 = y; this is elementary 
for us, but it was an ingenious feat when there was no analytic geometry and no 
algebra. There was also a solution found by Nicomedes.) 

There are several other geometric problems handed down from the Greeks. 
Can one trisect every angle? Can one construct a regular n -gon? Can one “square 
the circle”; that is, can one construct a square whose area is equal to the area of 
a given circle? 

Notation. Let P and Q be points in the plane; we denote the line segment with 
endpoints P and (Q by P Q. and we denote the length of this segment by \PQ\. 

Let L[P , (Q | denote the line determined by P and Q. and let C[ P: PQ] 
denote the circle with center P and radius \PQ\. 

If we do not give a precise definition of constructibility, then some of the 
classical problems appear ridiculously easy. Lor example, a 60° angle can be 
trisected using a protractor: just find 20° and draw the angle. Thus, it is essential 
to state the problems carefully and to agree on certain ground rules. The Greek 
problems specify that only two tools are allowed, and each must be used in only 
one way. Given distinct points P and Q in the plane, a straightedge is a tool that 
can draw the line L[P, Q ]; a compass is a tool that draws the circle C[ P; PQ] 
with radius \PQ \ and center P or the circle C[ Q; QP] = C[ Q: PQ] with center 
Q and radius \ QP\. Since every construction has only a finite number of steps, 
we shall be able to define “constructible” points inductively. 

What we are calling a straightedge, others call a ruler, we use the first term 
to avoid possible confusion, for a ruler has the following extra property: one 
can mark two points on it, say, U and V, and the marked point U is allowed to 
slide along a circle. This added function of a ruler makes it a more powerful 
instrument. Lor example, Nicomedes solved the Delian problem of doubling 
the cube using a ruler and compass; both Nicomedes and Archimedes were able 
to trisect arbitrary angles with these tools (we present Archimedes’s proof later 
in this section). On the other hand, we are going to show that both of these 
constructions are impossible to do using only a straightedge and compass. (Some 
angles, e.g., 90° and 45°, can be trisected using a straightedge and compass; 
however, we are saying that there are some angles, e.g., 60°, that can never be so 
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trisected.) When we say impossible, we mean what we say; we do not mean that 
it is merely very difficult. The reader should ponder how one might prove that 
something is impossible. About 425 B.C., Hippias of Elis was able to square the 
circle by drawing a certain curve as well as lines and circles. We shall see that 
this construction is also impossible using only straightedge and compass. 

Given the plane, we establish a coordinate system by first choosing two dis- 
tinct points, A and A; call the line they determine the x-axis. Use a compass 
to draw the two circles C[A; AA] and C[A; AA] of radius |AA| with centers 
A and A, respectively. These two circles intersect in two points; the line they 
determine is called the y-axis; it is the perpendicular bisector of A A, and it inter- 
sects the a - axis in a point O, called the origin. We define the distance CM to 
be 1 . We have introduced coordinates in the plane; in particular, A = (1,0) and 
A =(-1,0). 


A 



A 


Figure 4.1 The First Constructible Points 


Informally, one constructs a new point T from (not necessarily distinct) old 
points P , Q, R, and S by using the first pair P, Q to draw a line or circle, the 
second pair R, S to draw a line or circle, and then obtaining T as one of the points 
of intersection of the two drawn lines, the drawn line and the drawn circle, or the 
two drawn circles. More generally, a point is called constructible if it is obtained 
from A and A by a finite number of such steps. Given a pair of constructible 
points, we do not assert that every point on the drawn line or the drawn circles 
they determine is constructible. 

Here is the formal discussion. Recall that if P and Q are distinct points in 
the plane, then L[P , Q | is the line they determine and C [ P ; P Q I is the circle 
with center P and radius \PQ\. 

Definition. Let E F and G f H he points in the plane. A point Z is built 
from E, F, G, and H if one of the following conditions hold: 

(i) Z e L[E, F] n L[G, H\, where L[E, F] ^ L[G, Hi; 

(ii) Z e L[E, F] n C[G; GH] or Z e L[G , H] n C[E; EF ]; 
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(iii) Z e C[E; EF] n C[G; GH], where C[E; EF] ± C[G; GH]. 

Declare, once for all, that two points A and A are constructible . A point Z is 
constructible if Z = A or Z = A or if there are points P\ , . . . , P n with Z = P n 
so that, for all j > 1, the point Pj+\ is built from points in {A, A, P \, ... , Pj } . 


Example 4.44. 

Let us show that Z = (0, 1) is constructible. We have seen, in Figure 4.1, that 
the points P 2 = (0, s/3) and P3 = (0, — s/3) are constructible, for both lie in 
C[A; AA] n C[A; AA], and so the y-axis L[P 2 , P3] can be drawn. Finally, 

Z = (0,l)€L[Pi,^]nC[O;OA], ◄ 

In our discussion, we shall freely use any standard result of euclidean geom- 
etry. For example, every angle can be bisected with straightedge and compass; 
i.e., if (cosd, sind) is constructible, then so is (cos \0, sin ^9). 

Definition. A complex number z = x + iy is constructible if the point (x, y ) 
is a constructible point. 

Example 4.44 shows that the numbers 1, —1,0,/ s/3, —is/ 3, /, and — i are 
constructible numbers. 

Lemma 4.45. A complex number z = x + iy is constructible if and only if its 
real part x and its imaginary part y are constructible. 

Proof If z is constructible, then a standard euclidean construction draws the 
vertical line L through (x, y) which is parallel to the y-axis. It follows that x is 
constructible, for the point (x, 0) is constructible, being the intersection of L and 
the x-axis. Similarly, the point (0, y) is the intersection of the y-axis and a line 
L' through (x, y) which is parallel to the x-axis. It follows that P = (y, 0) is 
constructible, for it is in the intersection of C[G; OP] with the x-axis. Hence, y 
is a constructible number. 

Conversely, assume that x and y are constructible numbers; that is, Q = 
(x, 0) and P = (y, 0) are constructible points. The point (0, y) is constructible, 
being the intersection of the y-axis and C[G; OP]. One can draw the vertical 
line through (x, 0) as well as the horizontal line through (0, y), and (x, y) is 
the intersection of these lines. Therefore, (x, y) is a constructible point, and so 
z = x + iy is a constructible number. • 


Definition. We denote by K the subset of C consisting of all the constructible 
numbers. 
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Theorem 4.46. The set of all constructible real numbers K fl R is a subfield of 
R that is closed under square roots of its positive elements. 

Proof Let a and b be constructible reals. 

(i) —a is constructible. If P = (a, 0) is a constructible point, then (—a, 0) is 
the other intersection of the x-axis and C[0: OP]. 

(ii) a + b is constructible. 


I 

b 

Q 


\ 

\ 


\ 

\ 

1 

\ 

\ 


\ 

\ 


a \ 

b \ 

O 

P 

s 


Figure 4.2 

a+b 


Assume that a and b are positive. Let I = (0, 1), P = (a. 0) and Q = ( b , 1). 
Now Q is constructible: it is the intersection of the horizontal line through I and 
the vertical line through ( b , 0) [the latter point is constructible, by hypothesis]. 
The line through Q parallel to I P intersects the v-axis in 5 = (a+b. 0), as 
desired. 

To construct/? — a, let P = (— a, 0) in Figure 4.2. Thus, both a+b and — a + b 
are constructible; by part (i), both —a — b and a — b are also constructible. 

(iii) ab is constructible. 



By part (i), we may assume that both a and b are positive. In Figure 4.3, 
A = (1, 0), B = (1 + a, 0), and C = (0, b). Define D to be the intersection of 
the y-axis and the line through B parallel to AC. Since the triangles AO AC and 
A OBD are similar, 


\OB\/\OA\ = \ OD\/\ OC\; 
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hence ( a + 1 )/ 1 = (b + \CD\)/b, and \CD\ = ab. Therefore, b + ab is 
constructible. Since — b is constructible, by part (i), we have ab = (b + ab) — b 
constructible, by part (ii). 

(iv) If a / 0, then a~ l is constructible. Let A = (1, 0), S = (0, a), and T = 



Figure 4.4 a 1 

(0, 1 + a). Define B as the intersection of the x-axis and the line through T 
parallel to A 5; thus, B = (1 + u, 0) for some u. Similarity of the triangles 
A OS A and A OT B gives 

|Or|/|05| = \OB\/\OA\. 

Hence, (1 +a)/a = (1 + n)/l, and so u = a~ l . Therefore, 1 +a _1 is construc- 
tible, and so (1 + a _1 ) — 1 = a~ l is constructible. 

(v) If a ■ ' 0, then yfa is constructible. Let A — (T , 0) and P — (1 — J- n, 0)j 



Figure 4.5 J~a 

construct Q, the midpoint of OP. Define R as the intersection of the circle 
C[Q\ Q O | with the vertical line through A. The (right) triangles A AOR and 
A ARP are similar, so that 


\OA\/\AR\ = \AR\/\AP\, 
and, hence, = s/a. • 

Corollary 4.47. The set K of all constructible numbers is a subfield of C that 
is closed under square roots. 
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Proof. If z = a + ib and w = c + id are constructible, then a , b , c, d are 
constructible, by Theorem 4.46, and so a, b, c, d e A"nR. Hence, a + c, fi+d e 
if n M, because K n R is a subfield of ffi, and so {a + c) + i ( b + d) e K, by 
Lemma 4.45. Similarly, zw = ( ac — bd ) + i(ad + be) e K. If z / 0, 
then z~ 1 = ( a/zz ) — i{b/zz). Now a, b e K n ffi, by Lemma 4.45, so that 
zz = a 2 + b 2 e K n M, because K Pi M is a subfield of C. Therefore, z _1 e K, 
and so K is a subfield of C. 

If z = a+ib e K , then a, b e K HM, by Lemma 4.45, and so r 2 = a 2 +b 2 e 
K n K, Since r 2 is nonnegative, we have ^Jr e K n M. Now z = re' 9 , so that 
e‘ e = r~ l z € K, because K is a subfield of C. That every angle can be bisected 
gives e l9 ! 2 e K, and so *Jz = s/re' 9 ^ 2 e K, as desired. • 

Corollary 4.48. If a, b, c are constructible, then the roots of the quadratic 
ax 2 + bx + c are also constructible. 

Proof. This follows from the quadratic formula and Corollary 4.47. • 

We are now going to give an algebraic characterization of the geometric idea 
of constructibility. Recall that if K/k is an extension of fields (that is, k is a 
subfield of a field K), then K may be regarded as a vector space over k. The 
dimension of K, denoted by [if : k], is called the degree of K/k. In particular, 
if E/k be an extension and z e f is a root of an irreducible polynomial p (x ) e 
k[x ], then Proposition 4.30 gives [, k{z ) : A:] = diiru (A:(z)) = deg(/r). 

Definition. A 2-tower is a tower of subfields of C, 

Q(i) =foCfiC...cF„, 

with [Fj ■ Fj- 1 ] < 2 for all / > I. A complex number z is poly quadratic if 
there is a 2-tower Q (i ) = Fq c F\ c • • • c F n with z e F n . Denote the set of 
all polyquadratic complex numbers by V. 

We now begin a series of lemmas which culminates in Theorem 4.53, which 
says that a complex number is constructible if and only if it is poly quadratic. 

Lemma 4.49. IfF/k is afield extension, then [F : k] = 2 if and only F = k(u), 
where u € F is a root of some quadratic polynomial fix) £ k[x\. 

Proof. If [F : k\ = 2, then F k and there is some u e F with u g k. By 
Proposition 4.32, there is some (irreducible) polynomial f(x) e k[x ] having u 
as a root. We have 2 = [F : k] = [F : k{u)][k{u) : k], by Theorem 4.31. Now 
[k(u) : k] > 2, for [k(u) : k] 1 because k{u) k\ hence, [F : k(u)] = 1 and 
F = k(u). Moreover, Proposition 4.30 gives deg (/) = 2, so that u is a root of a 
quadratic polynomial. 
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Conversely, let F = k(u), where u is a root of a quadratic polynomial 
fix) e k[x\. (We may assume that fix) is irreducible, lest u e k and F = k, 
contradicting [F : k\ = 2.) A basis for k(u) over A: is 1, u, by Proposition 4.30, 
and so [F : k] = [kiu) : u ] = 2. • 

Lemma 4.50. 

(i) V is a subfield of C that is closed under square roots. 

(ii) A complex number z. = a + ib, where a.b e ffi, is polyquadratic if and 
only if both a and b are polyquadratic. 

Proof. 

(i) If z, z! e V, then there are 2-towers Q ii) = Fq C F\ C • • • C F n and 
Q(i) = Fq C F[ C - • C F' n with z e F„ and z! € F' m . Now \Fj : Fj- 1 ] < 2 
implies Fj = Fj- \ (ii j), where uj e Fj is a root of some quadratic fj(x) e 
Fj_\[x]. For all j with 1 < j < n, define F" = F' m iu \, . . . , u j). Since Fj = 
F"_i iu j), we have Fj _ i = F^(u \, . . . , My_f) C F' m (m \ , . . . , uf) = F'j_ v so 
that fj ix) e F". , [x\ and [F'j : F". . ] < 2. Hence, 

is a 2-tower. Of course, every element of F" is poly quadratic; since F" contains 
both z and z / , it contains their inverses and their sum and product. Therefore, V 
is a subheld. 

Let z € V. If Q ii) = F) c Fi C • • • c F„ is a 2-tower with z € F„, then 
Q(i) = Fo c Fi c • • • c F n c F n i*Jz) is also a 2-tower. 

(ii) If both a.b e V, then z = a + ib e V, for V is a subheld containing i. 
Conversely, let Q(i) = Fo C Fi C • • • C F„ be a 2-tower with z e F„ . Since 
complex conjugation is an automorphism of C, Q(i) = Fo c Fj c • • • c F„ is 
a 2-tower with z e F„; hence, z is poly quadratic. Therefore, a = -y(z + z) G V 
and b = jj(z — z)eV. • 

Lemma 4.51. Let P = a + ib, Q = c + id eP. 

(i) The line L[P, Q ] has equation x = a if it is vertical (c = a) or y = mx+q 
if it is not vertical (c j -a), where m, q £ 'P. 

(ii) The circle C[P; PQ] has equation (x — a) 2 + (y — b) 2 = r 2 , where 
a,b,r e V. 

Proof. Lemma 4.50 gives a, b, c, d e V. 

(i) If L\P . Q ] is not vertical, then its equation is y = mx + q, where m = 
id — b)/ic — a) and q = —ma + b. Hence m, q e V. 
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(ii) The circle C[P\ P Q ] has equation ( x — a) 2 + (y — b) 2 = r 2 , where r is the 
distance from P to Q. Now a,b £ V, by Lemma 4.50(h), and since V is closed 
under square roots, r = y] (c — a) 2 + (d — b ) 2 £ V. • 

Proposition 4.52. Every polyquadratic number z, is constructible. 

Proof. If z £ V. then there is a 2-tower (Q (i ) = Fo C F\ c • • • c F,, 
with z e F n ; we prove that z £ K by induction on n > 0. The base step is 
true, for Fq = Q ( i ) c K , by Corollary 4.47. Now F n = F n -\{u), where 
u is a root of a quadratic /(x ) = x 2 + bx + c £ F n _ i[x]. The quadratic 
formula gives u £ [\/b 2 — 4c); but K is closed under square roots, by 

Corollary 4.47, so that s/b 2 — Ac £ K. The inductive hypothesis F n -\ c K 
now gives z e F„_i (\/h 2 — 4c) c K (\/h 2 — 4c) c K. • 

Here is the result we have been seeking. 

Theorem 4.53. A number z £ Cis constructible if and only ifz is polyquadratic. 

Proof. In light of Proposition 4.52, V c K . and so it suffices to prove that every 
constructible z is polyquadratic: K c 'p. There are complex numbers 1, wo = 
— 1, w\, . . . , w m = z with Wj built from u>o, w\, ... , Wj - 1 for all j > 0. We 
prove, by induction on m > 0, that w m is poly quadratic. Since wo = —1 is 
poly quadratic, we may pass to the inductive step. By the inductive hypothesis, 
we may assume that wo, • • • , w m _i are poly quadratic, and so it suffices to prove 
that if z is built from P, Q, R, S, where P, Q, R, S are poly quadratic, then z is 
polyquadratic. 

Case 1: z e L\P, Q\nL[R. S]. 

If L[P. Q] is vertical, then it has equation x = a\ if L[P, Q\ is not vertical, 
then Lemma 4.31says that L[P, Q ] has equation y = mx + q, where m, q £ V. 
Similarly, L[R , 5] has equation x = c or y = m'x + p, where nt . p £ V. Since 
these lines are not parallel, one can solve the linear system 

y = mx + q 
y = m'x + p 

for z = xo + iyo £ L[P, Q ] n L[R , 5]. Therefore, z = xo + iy o £ V. 

Case 2:ze L[P, Q] n C[R; RS ]. 

If R = ( u , v ) and S = ( s , t ), the circle C[7?; 7?5] has equation (x — w) 2 + 
(y — u) 2 = p, where p 2 = (u — s) 2 + (v — t) 2 ; moreover, all coefficients lie 
in V, by Lemma 4.51. If the line L[P, Q] is vertical, its equation is x = a. If 
z = xo + iyo £ L[P, Q\ Cl C[R\ RS], then (xq — u ) 2 + (yo — v) 2 = p 2 , so that 
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yo is a root of a quadratic in V\x\. and z = a + iy o e V. If the line L\P. Q] is 
not vertical, its equation is y = mx + q, where m, q € V. If z = xo + iyo e 
L[P, <2]ClC[f?; RS], then {xQ — u) 2 + {mxQ+q — v) 2 = p 2 , andsoxo € V, for it 
is a root of a quadratic in V[x]. Hence, yo = mxQ+q e V and z = xo+iyo £ P- 


Case 3:ze C[P; PQ\ n C[R\ RS]. 

If R = ( u , v ) and 5 = (s, t), the circle C[P; PS] has equation (x — u) 2 + 

(y — u) 2 = p 2 , where p 1 = ( u — s) 2 + (v — t) 2 \ similarly, if P = ( a,b ) and 
Q = (c, d), the circle C[P\ PQ] has equation (x — a) 2 + (y — b ) 2 = r 2 , where 
r 2 = (w — s) 2 + {v — t) 2 . By Lemma 4.51, all the coefficients lie in V. If 
z = xo + iyo e C\P\ PQ\C\ C[P; RS], then expanding the equations of the 
circles gives an equation of the form 

Xq + y 0 2 + crxo + /3yo + y = 0 = Xq + _Vq + a'xo + /l^yo + 

Canceling Xq + _Vq gives a linear equation A.x + fxy + v = 0 with n, v e P: 
indeed, Xx + fiy + v = 0 is the equation of a line L[P', Q'\ with P' , Q' G V 
[for example, take P' = (0, —v/n) and Q' = ( — v/A, 0)]. Thus, the points 
z e C[P\ PQ]C C[P; P5] are the points of intersection of the line L[P ' , Q'] 
and either circle. The argument in Case 2 now shows that ze?. • 

Corollary 4.54. If a complex number z is constructible, then [Q(z) : Q] is a 
power of 2. 

Remark. The converse of this corollary is false; it can be shown that there are 
nonconstructible numbers z with [Q (z) : Q| = 4. 

Proof. This follows from Theorems 4.53 and 4.31. • 

Remark. It was proved by G. Mohr in 1 672 and, independently, by L. Mascheroni, 
in 1797, that every geometric construction carried out by straightedge and com- 
pass can be done without the straightedge. There is a short proof of this theorem 
given by N. Hungerbiihler in The American Mathematical Monthly, 101 (1994), 
pp. 784-787. < 

Two of the classical Greek problems were solved by P. L. Wantzel (1814— 
1848) in 1837. 


Theorem 4.55 (Wantzel). It is impossible to duplicate the cube using only 
straightedge and compass. 
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Proof 6 The question is whether z = v^2 is constructible. Since a 3 — 2 is 
irreducible, [Q (z) : Q] = 3, by Corollary 4.54; but 3 is not a power of 2. • 

Consider how ingenious this proof is. At the beginning of this section, we 
asked the reader to ponder how one might prove impossibility. The idea here is 
to translate the problem of constructibility into a statement of algebra, and then 
to show that the existence of a geometric construction produces an algebraic 
contradiction. 

A student in one of my classes, imbued with the idea of continual progress 
through technology, asked me, “Will it ever be possible to duplicate the cube 
with straightedge and compass?” Impossible here is used in its literal sense. 

Theorem 4.56 (Wantzel). It is impossible to trisect a 60° angle using only 
straightedge and compass. 

Proof. We may assume that one side of the angle is on the x-axis, and so the 
question is whether z = cos 20° + i sin 20° is constructible. If z is constructible, 
then Lemma 4.45 would show that cos 20° is constructible. Corollary 1.23, the 
triple angle formula, gives cos 3a = 4 cos 3 a — 3 cos a. Setting a = 20°, we have 
cos 3a = ^, so that z = cos 20° is a root of 4x 3 — 3x — \\ equivalently, cos 20° is 
a root of /(x) = 8x 3 — 6x — 1 eZ [x]. Now /(x) e Z [x] is irreducible in Q[x] 
because /(x) is irreducible mod 7 (Theorem 3.98). Therefore, 3 = [Q (z.) : Q], 
by Theorem 3.1 16(iv), and so z = cos 20° is not constructible, because 3 is not 
a power of 2. • 

If the rules of constructibility are relaxed, then an angle can be trisected. 

Theorem 4.57 (Archimedes). Every angle can be trisected using ruler and 
compass. ( Recall that a ruler is a straightedge on which points U and V can be 
marked ; moreover, the point U is allowed to slide along a circle.) 

Proof. Since it is easy to construct 30°, 60°, and 90°, it suffices to trisect an 
acute angle a, for if 3/1 = a, then 3(/6 + 30°) = a+90°, 3(/S + 60°) = a+ 180°, 
and 3(/l + 90°) = a + 270°. 

Draw the given angle a = LAOE , where the origin O is the center of the 
unit circle. Take a ruler on which the distance 1 has been marked; that is, there 
are points U and V on the ruler with \UV\ = 1. There is a chord through A 
parallel to E F; place the ruler so that this chord is AU. Since a is acute, U lies 
in the first quadrant. Keeping A on the sliding ruler, move the point U down 
the circle; the ruler intersects the extended diameter EF in some point X with 

6 The notion of dimension of a vector space was not known in the early 19th century; in 
place of Corollary 4.54, Wantzel proved that if a number is constructible, then it is a root of 
an irreducible polynomial in Q[x] of degree 2" for some n. 
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| UX | > 1 . Continue moving U down the circle, keeping A on the sliding ruler, 
until the ruler intersects EF in the point X = C. 



Relabel the points as in Figure 4.7, so that U = B and \BC\ = 1. We claim 
that ft = LBCO = Now 

oc = S + ft, 

because a is an exterior angle of A AOC, and hence it is the sum of the two 
opposite internal angles. Since A OAB is isosceles (O A and OB are radii), 
S = s, and so 

a = s + ft. 

But s = y + ft = 2ft, for it is an exterior angle of the isosceles triangle ABC O', 
therefore, 


a = 2ft + ft = 3 ft. 
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Theorem 4.58 (Lindemann). It is impossible to square the circle with straight- 
edge and compass. 

Proof. The problem is whether one can construct a square whose area is the 
same as the area of the unit circle. If a side of the square has length z, then 
one is asking whether z = -Jit is constructible. Now Q(n) is a subspace of 
Qis/n). We have already mentioned that Lindemann proved that i r is transcen- 
dental (over Q), so that [Q(tt) : Q] is infinite. It follows from Corollary 4.25(h) 
that [Q(v/7t) : Q] is also infinite. Thus, [Q(v/jt) : Q] is surely not a power of 2, 
and so s /tx is not constructible. • 

Sufficiency of the following result was discovered, around 1796, by Gauss, 
when he was still in his teens (he wrote that this result led to his decision to be- 
come a mathematician). He claimed necessity as well, but none of his published 
papers contains a complete proof of it. The first published proof of necessity is 
due to P. L. Wantzel, in 1837. 

Theorem 4.59 (Gauss-Wantzel). Let p be an odd prime. A regular p-gon is 

r \ f 

constructible if and only if p = 2 Z + 1 for some t > 0. 

Proof. We only prove necessity; for sufficiency, see Theorem 5.41. The prob- 
lem is whether z = e lni ^ p is constructible. Now z is a root of the cyclotomic 
polynomial 4> p (x), which is an irreducible polynomial in Q[.v] of degree p — 1, 
by Corollary 3.104. 

Since z is constructible, p — 1 = 2 s for some s (by Corollary 4.54), so that 

P = 2 s + 1. 

We claim that s itself is a power of 2. Otherwise, there is an odd number k > I 
with s = km. Now k odd implies that —1 is a root of x k + 1; in fact, there is a 
factorization in Z[x]: 

x k + 1 = (x + l)(x^ — 1 - x k ~ 2 + x k ~ 3 +1). 

Thus, setting x = 2 m gives a forbidden factorization of pin Z: 
p = V + 1 = (2 m ) k + 1 

= [2 m + l][(2'”) i_1 - (2'V“ 2 + (2 m ) k ~ 3 + 1], • 

Gauss constructed a regular 17-gon explicitly, a feat the Greeks would have 
envied. On the other hand, it follows, for example, that it is impossible to con- 
struct regular 7-gons, 1 1-gons, or 13-gons. 

Numbers F, of the form F, = 2? +1 are called Fermat primes if they are 
prime. For 0 < t < 4, one can check that F, is, indeed, prime; they are 


3, 5, 17, 257, and 65,537. 
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It is known that the next few values of t give composite numbers, and it is un- 
known whether any other Fermat primes exist. 

The following result is known. 

Theorem. A regular n-gon is constructible if and only if n is a product of a 
power of 2 and distinct Fermat primes. 

Proof See Theorem 5.41. • 

4.3 Linear Transformations 

Homomorphisms between vector spaces are called linear transformations. 

Definition. A function T : V — > W, where V and W are vector spaces over 
a held k, is a linear transformation if, for all vectors u, v e V and all scalars 

a k, 

(i) T(u + v) = T(u) + T(v); 

(ii) T(av)=aT(v). 

We say that a linear transformation T is nonsingular (or is an isomorphism) if 
T is a bijection. Two vector spaces V and W over k are isomorphic, denoted by 
V = W, if there is a nonsingular linear transformation T : V — > W. 

It is easy to see that a linear transformation T preserves all linear combina- 
tions: 

T{a\v i 4 t -a m v m ) = a\T(v\) H h a m T(v m ). 


Example 4.60. 

(i) The identity function 1 y : V — > V on any vector space V is a nonsingular 
linear transformation. 

(ii) If T : U — > V and S : V — > W are linear transformations, then so is their 
composite S o T : U — »■ W. Moreover, if T is nonsingular, then its inverse 
function T~ l : V — U is also a linear transformation. 

(iii) If V and W are vector spaces over a field k, write 

Homyt(F, W ) = { all linear transformations V — > W}. 

Define S + T by S + T : v S(v) + T{v) for all v e V, and define cT, 
where c e k, by cT : wia cT(v) for all v € V. It is routine to check that 
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both S + T and cT are linear transformations and that I lom* (V, W) is a 
vector space over k. 

(iv) The function T a : k" k m , defined by J'a (x ) = Ax, where x is an n x 1 
column vector and Ax is matrix multiplication, is easily seen to be a linear 
transformation. We shall see, in Proposition 4.63, that every linear trans- 
formation k" — > k m is equal to J'a for some m x n matrix A. < 

We now show how to construct linear transformations T : V — > W, where 
V and W are vector spaces over a field k. The next theorem says that there is a 
linear transformation that does anything to a basis. 

Theorem 4.61. Let vi, . . v n be a basis of a vector space V over a field k. 
If W is a vector space over k and w\, . . . , w n is a list in W, then there exists a 
unique linear transformation T : V — > W with T(vi) = Wi for all i. 

Proof. By Theorem 4.15, each v e V has a unique expression of the form 
v = ajVj, and so T : V — »• W, given by T(v ) = afWi, is a (well- 

defined!) function. It is now a routine verification to check that T is a linear 
transformation. 

To prove uniqueness of T, assume that S : V W is a linear transformation 
with S ( Vi ) = Wi = T (vj) for all i. If v € V, then v = ajVi and 

S(v) = aiVj ) = ^2 s ( a i v i) 

= ^2 a iS(Vi) = ^ 2 a i T ( V i) = T (v)- 


Since v is arbitrary, S = T. • 

Corollary 4.62. If linear transformations S, T : V — > W agree on a basis, then 
S = T. 

Proof If v \, . . . , v n is a basis of V and if 5(n,) = T (Vj) for all i, then the 
uniqueness statement in Theorem 4.61 gives S = T . • 

Linear transformations k n — »■ k" 1 are easy to describe; every one arises from 
matrix multiplication, as in Example 4.60(iv). 

Proposition 4.63. If T : k n — »• k!" is a linear transformation, then there exists 
a unique m x n matrix A such that 


T(y ) = Ay 

for all y e k n {here, y is an n x 1 column matrix and Ay is matrix multiplication). 
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Proof. If e\, . . . , e n is the standard basis of k n and e[, ... , e' m is the standard 
basis of k m , define A = [ajj] to be the matrix whose / th column is the coordinate 
list of T (ef). If S : k n — > k m is defined by 5(y) = Ay, then S = T because they 
agree on a basis: T (ej) = a ij e \ = Aej = S(ej), the y'th column of A. 

Uniqueness of A follows from Corollary 4.62, for the j th column of A is the 
coordinate list of T(ej). • 

Let T : V — »■ W he a linear transformation, and let X = v \, . . . , v n and 
Y = w i, ... , w m be bases of V and W . respectively. The matrix for T is set up 
from the equation 

T(vj) = a\jW\ + aijwi H h a m jW m = ^ ajj u>j . 

i 

This is the reason we write T (vf) = ajjWj instead of T ( vj ) = ajjWi, 
which appears to be more natural. 

Example 4.64. 

We show that R f : M 2 — > M 2 , counterclockwise rotation about the origin by f 
radians, is a linear transformation. If we identify M 2 with C, then every point 
can be written (in polar form) as (r cos 6, r sin 9), and we have the formula: 

Rf(r cos 9, r sin 9) = (r cos(6> + xfr), r sin(0 + x/s)). 

Denote the standard basis of M 2 by e \ , ei, where 

e\ = (1, 0) = (cost), sinO) and ej = (0, 1) = (cos sin ^). 

Thus, 

Rf{e i) = R^{ cosO, sinO) = (cos f, sin f), 

and 

Rf(e 2 ) = Rf{ cos sin |) 

= (cos(f + 1 Jr), sin(2. + xf)) 

= (— sin if, cos x//). 

On the other hand, if T is the linear transformation with 

T (e\) = (cos \fr, sin i/r) and T(e 2 ) = (— sin i/f, cos i/f), 
then the addition formulas for cosine and sine give 
T(r cos 9, r sin0) = r cos 9T (ci) + r sin 9T (eo) 

= r cos 9 (cos i/f , sin x/r ) + r sin 9 (— sin 1 jr, cos x}/) 

= (r [cos 9 cos xf — sin 9 sin 1 fr], r[cos 9 sin xj/ + sin 9 cos x[r]) 
= (r cos (9 + xf), r sin(# + xj/) 

= Rf(r cos 9, r sinO). 
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Therefore, = T, and so is a linear transformation. 

We gave a geometric proof that R,j, is a linear transformation in Proposi- 
tion 2.57. ◄ 

Proposition 4.63 establishes the connection between linear transformations 
and matrices. 


Definition. Let X = iq, . . . , v n be a basis of V and let Y = w i, ... , w m 
be a basis of W. If T : V — »• W is a linear transformation, then the matrix of 
T with respect to X and Y is the m x n matrix A = [o (/ ] whose / th column 
a\j, a 2 j, - ■ - , a m j is the coordinate list of T (vj) relative to Y.T (vj) = a\jW\ + 

b a n jw n - The matrix A does depend on the choice of bases X and Y, and we 

denote it by 

A = y[T] x . 

In case V = W, we usually let the bases X = vi, and 101 , . . . , w m 

coincide. If ly : V — »■ V, given by v v, is the identity linear transformation, 
then x[l v]x is the n x n identity matrix /„ (usually, the subscript n is omitted), 
defined by 

I = [S U ], 

where Sjj is the Kronecker delta: 


S 


ij 


1 if i = j; 
0 if i #7. 


Thus, I has l’s on the diagonal and 0’s elsewhere. On the other hand, if X and 
Y are different bases, then y[ly]x is not the identity matrix; its columns are 
the coordinate lists of the x’s with respect to the basis Y (such a matrix is often 
called the transition matrix from X to Y). 


Example 4.65. 

Let V be a vector space with basis X = v\, ... ,v n , and let o e S n be a per- 
mutation. By Theorem 4.61, there is a linear transformation T : V V with 
T ( Vi ) = v a (i) for all i. The reader should check that P a = x[ ! \x is the permu- 
tation matrix obtained from the n x n identity matrix / by permuting its columns 
by a. ◄ 

Example 4.66. 

Let k be a field and let k" be equipped with the usual inner product: if v = 
(«!,..., a n ) and u = {b\ b n ), then (v, u) = a\b\ + • • • + a n b n . Define 
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the adjoint 1 of a linear transformation T : k n — > k" to be a linear transformation 
T* : k n -> k n such that 

(Tu, v ) = (m, T*u) 


for all u,vek n . 

We begin by showing that T* exists. Let £ = e \, . . . , e n be the standard 
basis. If T* does exist, then it would have to satisfy 

(Tej, ej) = (ej , T*a) 

for all i, j . But if Tej = aj\e\ + • • • + aj n e n , then (Ley, e,) = ay,, by Exer- 
cise 4. 13 on page 344. With this in mind, let us define T*ej = a ue\ H {- a„je n 

for each i. By Theorem 4.61, we have defined a linear transformation T* . 

If A = [aij = e[T]e, then the defining equation for 7 1 * shows that e\T* = 
A 1 : that is, the matrix of the adjoint of A is the transpose of A. 

The definition of adjoint can be generalized. If T : V — »• W is a linear 
transformation, where V and W are vector spaces equipped with inner products, 
then its adjoint is a linear transformation T*: W — »■ V satisfying (T v, w) = 
( v , T*w) for all v € V and w e W. ◄ 


Example 4.67. 


(i) In Example 4.64, we considered R,i , : M 2 M 2 , counterclockwise rota- 
tion about the origin by i// radians. The matrix of Rf with respect to the 
standard basis E = e\, ei is 


eWe ~ 


cos jr 
sin js 


— sin 

COS l// 


(ii) This example shows that matrices assigned to a given linear transformation 
can actually be different. Let T : R 2 — »• M 2 be counterclockwise rotation 
about the origin by ^ radians. As in part (i), the matrix of T relative to the 
standard basis X = e \ , C 2 is 


x\T lx = 



The list Y = v\, vo, where t>i = (4, l) r and m = (—2, l) 7 are column 
vectors, is a basis. We compute y[T]y by writing T (v\) and T (vi) as 

7 There is another notion of adjoint, unrelated to this notion, on page 387. 
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linear combinations of v\, v 2 . Now 
T(v i) = 


0 -1 

1 0 


T(V 2 ) = 


0 -1 

1 0 


4 

1 

'-2 

1 


-1 

4 


-1 

-2 


We must find numbers a,b,c,d with 


T(v i) = 


rte) = 


-l 

4 

- 1 ' 

-2 


= aui + bv 2 


= Ciq + dl>2- 


Each of these vector equations gives a system of linear equations: 

4a — 2b = — 1 
a + b = 4 

and 

4c — 2d = — 1 

r — d = — 2. 

These are easily solved: 


a = l, b = ¥■, c = — I, d = —l. 


It follows that 


These computations will be revisited in Example 4.74. ◄ 


7 -5 

17 -7 


Example 4.68. 

We have illustrated, given a linear transformation T : V — > V and a basis X 
of V, how to set up the matrix A = x[T]x- We now reverse the procedure and 
show how to construct a linear transformation from an n x n matrix over k. 
Consider the matrix 


C = 


0 0 
1 0 
0 1 


8 

-6 

12 


To define a linear transformation T : k 3 —*■ k 3 , it suffices to specify 1' (c, ) for 
each vector in the standard basis E = e\, ei, e 3 . Using the columns of C, we 
define 


r(<?i) = <? 2 , T(e 2 ) = e 3 , r(e 3 ) = 8 ci - 6 c 2 + 12e 3 - 
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Of course, C = e\T] e . 

We now find the matrix of T with respect to a new basis. Define X = 

xo, xi, X 2 by 

XQ = e\, x\ = (C — 2I)e\, X2 = (C — 2I) 2 e\. 

We prove that X spans k 3 by showing that {X) = k 3 . Clearly, e\ = x$ e {X}, 
while x i = Ce\ — 2e\ = e 2 — 2xo; hence, 

62 = 2xo X ] £= {X) . 

Now X 2 = C 2 e i — 4Cci + 4ei = «3 — 4^2 + 4ei, so that 

^3 = X2 + 4^2 — 4e\ 

= X 2 + 4(2xo + x\) ~ 4x 0 
= 4xo + 4x 1 + X 2 € (X) . 

But a spanning list of 3 vectors in a 3-dimensional space must be a basis, and so 
X is a basis of k 3 . 

What is the matrix J = x[T'\x ■ Using the equations above, the reader may 
verify that 


T(xo) = 2xo +x\ 
T(x i) = 2xi +x '2 
T(x 2 ) = 2x2. 


It follows that the matrix of T with respect to the basis X is: 


J = 


2 0 0 
1 2 0 
0 1 2 


◄ 


The following proposition is a paraphrase of Theorem 4.61. 


Proposition 4.69. Let V and W be vector spaces over afield k, and let X and 
Y be bases of V and W, respectively. The function 

dx.Y ■ HomUU W) Mat mx „(k). 


given by 


T^ y [T]x, 


is an isomorphism of vector spaces. 
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Proof. First, let us show that ptx.Y is surjective. Given a matrix A, its columns 
define vectors in W. In more detail, if X = v \, . . . , v n and Y = w i, . . . , w m , 
then the j th column of A is (a \j, ... , a m f) T \ define zj = Y1T= l a ij w i- By Theo- 
rem 4.61, there exists a linear transformation T : V — > W with T (vj) = Zj, and 
y[T]x = A. To see that px.Y is injective, suppose that y[T]x = A = ytSjx. 
Since the columns of A determine T (vj) and S ( Vj ) for all j, Corollary 4.62 gives 
S = T. 

Finally, we show that fix, y is a linear transformation. Since, for all j, the 
y'th column of S + T is (5 + T)(vj ) = S(Vj) + T (vj), we have ij.x.y(S + T) = 
Hx,y(S) +Hx,y(T). A similar argument shows that /ix.y(cT ) = c(Ix.y(T). • 

The next proposition shows where the definition of matrix multiplication 
comes from: the product of two matrices is the matrix of a composite. 

Proposition 4.70. Let T : V — > W and S: W — > U be linear transformations. 
Choose bases X = x\ ,...,x n ofV, Y = yi , . . . , y m of W, and Z = z\ , . . . , zi 
ofU. Then 

z[SoT] x = ( z [S]y){y[T]x). 

Proof. Let y[T]x = [«//], so that T (x j) = Jf p a pj y p , and let z [S]y = [b qp ], 
so that S(y p ) = Jf q b qp z q . Then 

(5 o T)( Xj ) = S(T ( Xj )) = S(Y,a P jy P ) 

p 

= E«» ! W'EE* pjbqpZq — y^.CqjZq, 

P P <1 Q 

where c q j = b qp a pj . Therefore, 

Z [S o T] x = [Cqj] = ( Z [S]y)(y[r] x ). . 

Corollary 4.71. Matrix multiplication is associative: A(BC ) = ( AB)C . 

Proof. Let A be an m x n matrix, let B be an n x p matrix, and let C he a p x q 
matrix. By Theorem 4.61, there are linear transformations 

]pi fcP k m 

with C = [T], B = [S], and A = [ R \ (in order that the proof not be cluttered, 
we abbreviate notation by omitting bases: we write [T] instead of r[T\xh 
Then 


[Ro(So T)] = [tf][S o T\ = [J?]([5][r]) = A(BC). 
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On the other hand, 

[( R o 5) oT] = [Ro S][T] = ([/?][S])[r] = (. AB)C . 
Since composition of functions is associative, 

Ro(SoT) = (RoS)oT, 


and so 

A(BC ) = [Ro(So T)] = [(R oS)oT] = ( AB)C . • 

We can prove Corollary 4.71 directly, although it is rather tedious. The con- 
nection with composition of linear transformations is the real reason why matrix 
multiplication is associative. 

Corollary 4.72. Let T : V — > W be a linear transformation of vector spaces 

V and W over afield k, and let X and Y be bases of V and W, respectively. If T 
is nonsingular and A = y\T\x, then A is a nonsingular matrix and the matrix 
ofT~ x is 

x [T- l ] Y =A~ l = (H^k)" 1 . 

Proof 

I = Yttwh = {y[T]x){x[T- 1 ]y) 

and 

/ = i[lvk = (x[r 1 ]y)(r[f] 1 ). • 

The next corollary determines all the matrices arising from the same linear 
transformation. 

Corollary 4.73. Let T : V — > V be a linear transformation on a vector space 

V over a field k. If X and Y are bases of V, then there is a nonsingular matrix 
P with entries in k, namely, P = Y [ lyk> so that 

y[T]y = P{x[T]x)P~ l - 

Conversely, if B = PAP~ l , where B , A, and P are n x n matrices with entries 
in k and P is nonsingular, then there is a linear transformation T : k n — > k n and 
bases X and Y ofk n such that B = y\T] Y and A = xkk- 

Proof The first statement follows from Proposition 4.70 and associativity: 

y[T]y = rttvTlvh = (HlykXxkkXxtlvk). 

Set P = y [1 y]x, and note that Corollary 4.72 gives P~ l = x [\v]y- 
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For the converse, let £ = e\, . . . , e n be the standard basis of k n , and define 
T : k n — > k n by T (e j) = Ae j (remember that vectors in k" are column vectors, 
so that Ae j is matrix multiplication; indeed, Ae j is the / th column of A). It 
follows that A = e[T]e- Define a list Y = yi , . . . , y n by yj = P~ l ej ; that is, 
the vectors in Y are the columns of P ~ 1 . If Y is a basis, then it suffices to prove 
that B = y [T]y\ that is, T(y,) = J2, b ijYi’ where B = [bjj]. 

T ( yj ) = Ayj 

= AP~ l ej 

= P~ l Bej 

= P 1 Tb,je, 

i 

= J2 b ‘j p ~ lei 

i 

= J2 bi J yi 

i 

Let us show that Y = P~ l e i, . . . , P~ l e n is a basis of k" . If ajP~ l ej = 
0, then P~ l j a j e j) = 0; multiplying on the left by P gives J] • a/e/ = 0, and 
linear independence of the standard basis gives all aj = 0. Thus, Y is linearly 
independent. To see that Y spans k n , take w e k” . Now Pw = JN bjej, and so 
w = P~ l Pw = Ylj b jP~^ e j e (Y)- Therefore, Y is abasis. • 

Definition. Two n x n matrices B and A over a field k are similar if there is a 
nonsingular matrix P over k with B = PAP~ l . 

Corollary 4.73 says that two matrices are similar if and only if they arise 
from the same linear transformation on a vector space V (from different choices 
of basis). For example, the matrices C and J in Example 4.68 are similar. The 
first matrix C arises from a linear transformation T : k 3 — > k 3 relative to the 
standard basis E; that is, C = e[T]e- The second matrix J arises from the basis 
X in that example; that is, J = x[T]x- 

Example 4.74. 

We can now simplify the calculations in Example 4.67(h). Recall that we have 
two bases of M 2 : the standard basis E = e\,ei and F = v \ , i> 2 ? where v\ = [ j ] 
and r ; 2 = [ _ j 2 ], and the linear transformation 7' : M 2 -a- M 2 , rotation by 90°, 
with matrix 

0 -f 

1 0 • 


e\T] e = 
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Now the transition matrices are 

'4 -2 
1 1 



and 

P = fWe = eW f ' = \ 

Therefore, 

F[r] F = p £ [r] £ /’- 1 = i 

which agrees with our earlier result on page 371. A 

Just as for group homomorphisms and ring homomorphisms, we can define 
the kernel and image of linear transformations. 


7 -5 

17 -7 


1 2 
-1 4 


Definition. If T : V — »■ IT is a linear transformation, then the kernel (or the 
null space) of T is 

ker T = {veV : T(v) = 0}, 


and the image of T is 


im T = {w e W : w = T{v) for some v € V}. 

As in Example 4.60(iv), an m x n matrix A with entries in a field k deter- 
mines a linear transformation Ta '■ k" —> k m , namely, T a (y) = Ay, where y is 
an n x 1 column vector. The kernel of Ta is the solution space Sol (A) [see 
Example 4.3(iv)], and the image of Ta is the column space Col (A). 

The proof of the next proposition is routine. 


Proposition 4.75. Let T : V — > W be a linear transformation, 

(i) ker T is a subspace of V, and im T is a subspace of W. 

(ii) T is injective if and only if ker T = {0}. 

We can now give a new proof of Corollary 4.20 that a homogeneous system 
over a field k with r equations in n unknowns has a nontrivial solution if r < n. 
If A is the r x n coefficient matrix of the system, then T : x Ax is a linear 
transformation T : k n k r . If there is only the trivial solution, then ker T = 
{0}, so that k n is isomorphic to a subspace of k r , contradicting Corollary 4.25(i). 
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Lemma 4.76. Let T : V — > W be a linear transformation. 

(i) Let T be nonsingular. For every basis X = iq, i> 2 , ■ ■ . , v„ of V, we have 
T (X) = T (iq), T(v 2 ), . . . , T (v n ) is a basis ofW. 

(ii) Conversely, if there exists some basis X = Vi,V 2 , ... ,v n of V for which 
T(X) = T (iq), T (m), ■ ■ ■ , T ( v n ) is a basis of W, then T is nonsingular. 

Proof. 

(i) If JfciT(vi) = 0, then Tfjjcvtv) = 0, and so JfciVi e ker T = {0}. 
Hence each a = 0, because X is linearly independent. If in e W . then the 
surjectivity of T provides v e V with w = T(v). But v = ^faiVj, and so 
w = T(v) = T a, w,) = Y2 ajT{vi). Therefore, T (X) is a basis of W. 

(ii) If w e W, then w = JfciT(vi) = Tf^CiVi), since T(v 1 ), . . . , T(v n ) is a 
basis of W, and so T is surjective. If J]c,t>,- e kerT, then 'ffcjT{Vi) = 0, and 
so linear independence gives all c ; - = 0; hence, ty t>/ = 0 and ker 7' = {0}. 
Therefore, T is nonsingular. • 

Theorem 4.77. If V is an n -dimensional vector space over a field k, then V is 
isomorphic to k" . 

Proof. Choose a basis m, . . . , v n of V. If e \, . . . , e n is the standard basis of k n , 
then Theorem 4.61 says that there is a linear transformation T : V —> k n with 
T (tv) = ei for all i \ by Lemma 4.76, T is nonsingular. • 

Theorem 4.77 says more than that every finite-dimensional vector space is 
essentially the familiar vector space of all n -tuples. It says that a choice of basis 
in V is tantamount to a choice of coordinate list for each vector in V. We want 
the freedom to change coordinates because the usual coordinates may not be the 
most convenient ones for a given problem, as the reader has probably seen (in a 
calculus course) when rotating axes to simplify the equation of a conic section. 

Corollary 4.78. Two finite-dimensional vector spaces V and W over afield k 
are isomorphic if and only if dim(V) = dim(VT). 

Proof. Assume that there is a nonsingular T : V — »■ W . If X = v 1 .... . u„ 
is a basis of V, then Lemma 4.76 says that T (ui), . . . , T ( v n ) is a basis of W. 
Therefore, dimiVT) = |X| = dim(V). 

If n = dim(V) = dim(W), then there are isomorphisms T : V —>■ k n and 
S : W — > k", by Theorem 4.77. It follows that the composite S -1 o T : V — > W 
is an isomorphism. • 

Proposition 4.79. Let V be a finite-dimensional vector space over a field k 
with dim(y) = n, and let T : V — > V be a linear transformation. The following 
statements are equivalent: 
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(i) T is an isomorphism', 

(ii) T is surjective ; 

(iii) T is injective. 

Remark. Compare this proposition with the pigeonhole principle, Exercise 2.12 
on page 102. ◄ 

Proof. 

(i) =>• (ii) This implication is obvious. 

(ii) (iii) Assume that T is surjective. If X = v\, ■ • • > v n is a basis of V, we 

claim that T ( X ) = T (m), . . . , T ( v „ ) spans V. If w e V, then surjectivity of 
T gives v e V with w = T(v). Now v = for scalars a, e k, and so 

w = T(v) = JT T ( Vi ) . Since dim(V) = n, it follows from Corollary 4.24 
that T(X) is a basis of V. Lemma 4.76 now says that T is an isomorphism, and 
so T is injective. 

(iii) =>• (i) Assume that T is injective. If X = i>i , . . . , v n is a basis of V, then we 
claim that T ( X ) = T ( v i ) , . . . , T ( v n ) is linearly independent. If c, T ( ?;,■ ) = 0, 
then TCfjliCiTvi ) = 0, so that Yli c i v i e kerr = (0). Hence, Jfi c i v i = 0. 
and linear independence of X gives all c,- = 0. Therefore, T ( X ) is linearly 
independent. Since dim(E) = n, it follows from Corollary 4.24 that T ( A ) is a 
basis of V. Lemma 4.76 now says that T is an isomorphism. • 

Call a linear transformation T : V — > V singular if it is not nonsingular. 


Corollary 4.80. Let V be a finite-dimensional vector space, and let T : V —> V 
be a linear transformation on V. Then T is singular if and only if there exists a 
nonzero vector v e V with T ( v ) = 0. 

Proof. If T is singular, then ker /' f {0}, by Proposition 4.79. Conversely, if 
there is a nonzero vector v with T (u) = 0, then ker T f {0} and T is singular. 

• 

This corollary says that an n x n linear system Ax = 0 with a singular 
coefficient matrix A always has a nontrivial solution. 

Recall that an n x n matrix A with entries in a field k is nonsingular if there is 
a matrix B (its inverse) with entries in k with AB = I = BA. The next corollary 
shows that “one-sided inverses” are enough. 


Corollary 4.81. Let A and B be n x n matrices with entries in a field k. If 
AB = I, then BA = I. Therefore, A is nonsingular and B = A~ l . 
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Proof. There are linear transformations T. S : k" — »■ k" with x[T\x = A and 
x[S]x = B, where X is the standard basis. Let us abbreviate x[T ]x to [T] in 
this proof. Since AB = 7, Proposition 4.69 gives 

[T O S] = [T][S] = I = [l k n]. 

Since T i-a- [T] is a bijection, by Proposition 4.69, it follows that To S = lp. By 
Proposition 2.9, T is a surjection and S is an injection. But Proposition 4.79 says 
that both T and S are isomorphisms, so that S = T~ l and T o S = l^n = S o T. 
Therefore, / = [S o T] = [S][T] = BA, as desired. • 

Proposition 4.82. Let T : V — >• W be a linear transformation, where V and 
W are vector spaces over afield k of dimension n and m, respectively. Then 

dim(ker T) + dim(im T ) = n. 

Proof. Choose a basis u\, ... ,u p of ker T, and extend it to a basis of V by ad- 
joining vectors w\, w q . Now im T is spanned by the list T (hi), . . . , T (u p ), 
T{w\), ..., T{w q )\ but T(ui) = 0 for all i, so that imT is spanned by the 
shorter list T {w\), . . . ,T{w q ). Since dim (ker T) = p and p+q = n, it suffices 
to prove that T ( w i ) , . . . , T ( w q ) is a linearly independent list. 

If c\T(w\) + • • • + c q T (w q ) = 0, then T(c\w\ + • • • + c q w q ) = 0 and 
c\w\ + 1 - c q w q e ker T. Hence, there are a\, . . . , a p e k with 

C\W\ -\ h CqWq = a\u\ H 1- a p u p . 

Since u\, . . . ,u p ,w\, . . . ,w q is a basis of V , it is a linearly independent list, 
so that 0 = ci = • ■ ■ = c q (and also 0 = a\ = ■ ■ ■ = a p ). Therefore, 
T (idi), . . . , T ( w q ) is a basis of im T, and dim(im T) = q. • 

Corollary 4.83. If A is an m x n matrix over a field k, then 

rank(A) = rank (/l 7 ); 

the row space Row(A) and the column space Col (A) have the same dimension. 

Proof. Define T^: k n — »■ k m by T& (v) = Av. Now ker 1'^ is the solution 
space of the linear system Ax = 0, so that Theorem 4.42 gives dim(ker 7 ' 4 ) = 
n — r, where r = rank(A) [recall that rank (/l) = dim(7?ow(A))]. However, 
dim(ker Y 4 ) = n — dim(im Tjf), by Proposition 4.82. Therefore, 

rank(A) = r = dim(imTA). 

Finally, note that im V 4 is the set of all the linear combinations of the columns of 
A; that is, imTA = Col(A) = Row(A T ). Therefore, rank(A) = dim(Co/(A)). 
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Definition. If V is a vector space over a field k, then the general linear group, 
denoted by GL(V), is the set of all nonsingular linear transformations V — > V. 

A composite S o T of linear transformations S and T is again a linear trans- 
formation, and S o T is nonsingular if both S and T are; moreover, the inverse of 
a nonsingular linear transformation is again nonsingular. It follows that GL( V ) 
is a group with composition as operation, for composition of functions is always 
associative. 


Definition. The set of all nonsingular n x n matrices with entries in a field k is 
denoted by GL (n, k). 

It is easy to prove that GL (n, k) is a group under matrix multiplication. 

A choice of basis gives an isomorphism between the general linear group 
and the group of nonsingular matrices. 


Proposition 4.84. Let V be an n-dimensional vector space over a field k, and 
let X = v \ , . . . , v n be a basis of V. Then fx : GL(V) — > GL(ra, k ), defined by 
T i — > x\i |,v, is an isomorphism of groups. 

Proof By Proposition 4.69, the function /xx.x- T i-> [T] = x[T]x is an 
isomorphism of vector spaces 

Hom^V, V) -* Mat „(*). 

Moreover, Proposition 4.70 says that px.xiT o S) = txx,x(T)fxx,x(S) for all 
T, S g Hom*(V, V). 

If T e GL(V), then px.x(T) = x[T]x is a nonsingular matrix, by Corol- 
lary 4.72; thus, if // is the restriction of /i x.x, then fi : GL( V) — > GL(n, k) is 
an injective homomorphism. 

It remains to prove that /x is surjective. Since /i x. x is suijective, if A e 
GL (n, k ), then A = x[T]x f° r some T : V —> V . It suffices to show that T is 
an isomorphism, for then T e GL( V ). Since A is a nonsingular matrix, there is 
a matrix B with AB = I. Now B = x[*S]a: for some S : V — > V, and 

dx,x(T o S) = p-x,x(T)p-x,x(S ) = AB = I = /xx,x(lv). 

Therefore, T o S = I \/, since nx.x is an injection, and so T e GL( V ), by 
Corollary 4.81. • 

The center of the general linear group is easily identified; we now generalize 
Exercise 2.74 on page 167. 
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Definition. A linear transformation T : V — > V is a scalar transformation if 
there is c e k with T (u) = cv for all v e V\ that is, T = c 1 y . A scalar matrix 
is a matrix of the form cl, where c e k and I is the identity matrix. 

A scalar transformation T = cly is nonsingular if and only if c 0 (its 
inverse is c _1 ly). 

Corollary 4.85. 

(i) The center of the group GL( V) consists of all the nonsingular scalar trans- 
formations. 

(ii) The center of the group GL (n,k) consists of all the nonsingular scalar 
matrices. 

Proof. 

(i) If T e GL( V ) is not scalar, then there is some vector v e V with T(v ) 
not a scalar multiple of v. Of course, this forces v 0. We claim that the 
list X = v, T(v ) is linearly independent. We know that T(v) is not a scalar 
multiple of v. If v = dT(v), for some d e k, then d 0 (lest v = 0), and 
so T ( v ) = d~ l v, a contradiction. Hence, v. T ( v ) is a linearly independent list, 
by Example 4.12(iii). By Proposition 4.22, this list can be extended to a basis 
v, T (v), M 3 ,. . ., u n of V . It is easy to see that v, v + T ( v ), M 3 , . . . , u„ is also a 
basis of V, and so there is a nonsingular linear transformation S with S( v) = v, 
S(T (v)) = v + T ( v ), and Siu L ) = m, for all i. Now S and T do not commute, 
for ST (i>) = v + T (v) while TS(v) = T ( v ). Therefore, T is not in the center of 
GL(V). 

(ii) If /: G — > H is any group isomorphism between groups G and H, then 
f(Z(G )) = Z{H). In particular, if T = cl y is a nonsingular scalar transforma- 
tion, then xYT]x is in the center of GL(« , k) for any basis X = v\ , . . . , v„ of V. 
But T (Vj) = cvi for all i, so that x[T]x = c/ is a scalar matrix. • 


Exercises 

4.28 Let k be a fi eld, let V = k[x\, the polynomial ring viewed as a vector space over k. 
and let V„ = (1, x, x 2 , . . . , x n ). By Exercise 4.7 on page 343, we know that 
X n = 1, x, x 2 , . . . , x n is a basis of V n . 

(i) Prove that differentiation T : V 3 — »• V 3 , defi ned by T {fix)) = f(x), is a 
linear transformation, and fi nd the matrix A = X} IX] ^ of differentiation. 

(ii) Prove that integration S : V 3 — > V 4 , defined by S(f) = / 0 ' f{t)dt, is a 

linear transformation, and fi nd the matrix A = [5] ^ of integration. 

4.29 If o G S„ and P = P a is the corresponding permutation matrix (see Exam- 
ple 4.65), prove that P^ 1 = P 1 . 
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4.30 Let A be an n x n real symmetric matrix. 

(i) Give an example of a nonsingular matrix P for which PAP~ l is not 
symmetric. 

(ii) Prove that O A O ~ 1 is symmetric for every nxn real orthogonal matrix O . 
*4.31 Let V and W be vector spaces over a fi eld k, and let .S’, T : V -a IT be linear 

transformations. 

(i) If V and W are fi nite-dimensional, prove that 

dimfHom/. (T, IT)) = dim(V) dim(IT). 

(ii) The dual space V* of a vector space V over k is defi ned by 

V* = Hom/.(T, k). 

If X = i>i , . . . , v n is a basis of V , defi ne $,..., e V* by 


Si(vj) 


jo if j # i 
|l if j=i. 


Prove that Si, S n is a basis of V* (it is called the dual basis arising 
from vi, ... ,v n ). 

(iii) If dim(T) = n, prove that dim(T*) = n, and hence that V* = V . 


Remark. Here is a convincing reason why targets are necessary in a function’s defi ni- 
tion. In linear algebra, one considers a vector space V and its dual space V* = {all linear 
functionals on V} (which is also a vector space). Moreover, every linear transformation 
S : V -a- W defi nes a linear transformation 

S* : W* -»• V*, 

and the domain of S*. being IT*, is determined by the target IT of S. (In fact, if a matrix 
for S is A, then a matrix for S* is A T , the transpose of A.) Thus, changing the target of 
S changes the domain of S*. and so S* is changed in an essential way, and so the target 
is an essential part of the defi nition of function. 

4.32 (i) If 5: V — >• IT is a linear transformation and /e IT*, then the composite 

V — S -> W k lies in V*. Prove that S* : W* — > V*. defined by 
S * : f i— > / o S', is a linear transformation. 

(ii) If X = vi , . . . , v„ and Y = w\, , w m are bases of V and IT, respec- 
tively, denote the dual bases by X* and Y* (see Exercise 4.31). Prove 
that if .S’ : V IT is a linear transformation, then the matrix of S* is a 
transpose: 

x*\S*\y* = (ylSIx) 7 - 

4.33 (i) If A = [ ° b d ], defi ne det(A) = ad — be. Given a system of linear equa- 

tions Ax = 0 with coeffi cients in a fi eld, 

ax + by = p 
cx + dy = q, 
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prove that there exists a unique solution if and only if det(A) /= 0. 

(ii) If V is a vector space with basis X = tq, tq, define T: V — »• V by 
T(v i) = a tq +bv 2 and T(vj ) = c tq + dtq . Prove that T is anonsingular 
linear transformation if and only if det(x’[T]x) ^ 0. 

*4.34 Let U be a subspace of a vector space V . 

(i) Prove that the natural map n : V — > V/U, given by v i-> v + U, is 
a linear transformation with kernel U. (Quotient spaces were defined in 
Exercise 4.15 on page 344.) 

(ii) State and prove the fi rst isomorphism theorem for vector spaces. 

4.35 Let V be a fi nite-dimensional vector space over a fi eld k, and let B denote the 
family of all the bases of V. Prove that B is a transitive GL(P)-set. 

4.36 Recall that if U and W are subspaces of a vector space V such that U fl W = {0} 
and U + W = V , then U is called a direct summand of V and W is called a 
complement of U . In Exercise 4.19 on page 345, we saw that every subspace of a 
fi nite-dimensional vector space is a direct summand. 

(i) Let U = {(a, a) : a e R}. Find all the complements of U in R 2 . 

(ii) If U is a subspace of a fi nite-dimensional vector space V, prove that any 
two complements of U are isomorphic. 

4.37 Let T : V —> W be a linear transformation between vector spaces over a fi eld k, 
and defi ne 

rank(T ) = dim (ini T). 

(i) Let A be an m x n matrix over k. If Ta '■ k n — > k"‘ is the linear trans- 
formation defi ned by Ta (x) = Ax, where x is an n x 1 column vector, 
prove that rank(A) = rank(7A)- Conclude that one can compute the rank 
of a linear transformation T : V —*■ W by rank(r) = rank(A), where 
A = yiTix f° r bases X of V and Y of W. 

(ii) Prove that similar n x n matrices have the same rank. 

(iii) If A is an m x n matrix and B is an p x m matrix, prove that 

rank(5A) < rank(A). 

4.38 Let M" be equipped with the usual inner product: if v = (a \, . . . , a„) and u = 

(b i, , b n ), then ( v , u) = a\b\ + • • • + a n b n . 

(i) A linear transformation U : M" —*■ M" is called orthogonal if (U v. Uw) = 
(■ v , w) for all v, w e M" . 

Prove that every orthogonal transformation is nonsingular. 

(ii) An orthonormal basis of k n is a basis iq , . . . , v„ such that 

(Vj, vj) = Sjj, 

where (v,-, vj) is the inner product and S,j is the Kronecker delta. For 
example, the standard basis is an orthonormal basis for the usual inner 
product. 

Prove that a linear transformation U : R" — » R" is orthogonal if and 
only if U(v\), .... U (v„) is an orthonormal basis whenever tq, . . . , v n is 
an orthonormal basis. 
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(iii) If w G M" and v\, . . . , v n is an orthonormal basis, then w = J]" =1 c/ty. 
Prove that c ( - = ( w , v,). 

4.39 Let U : M” — »■ M" be an orthogonal transformation, and let X = v\, ...,v n be an 
orthonormal basis. If O = x\U]x> prove that = O 1 . (The matrix O is called 
an orthogonal matrix.) 


4.4 Determinants 

We introduce determinants of square matrices, and we will use them to investi- 
gate invertibility. Several important results in this section will be stated without 
proof. 

The usual, though inelegant, definition of the determinant of an n x n real 
matrix A = \a,j 1 is 

det(A) = ^ ' Sgn(cr )fl<7(T), lQ<7(2),2 ‘ ‘ ' a o(n),n- 

G G S n 

Recall that sgn(cr) = ±1 : it is + 1 if a is an even permutation, and it is — 1 if o is 
odd. The term ao-(i),ia<r(2),2 • • • ct a (n),n has exactly one factor from each column 
of A because all the second subscripts are distinct, and it has exactly one factor 
from each row because all the first subscripts are distinct. One often calls this 
formula the complete expansion of the determinant. From this definition, we see 
that the formula for det(A) makes sense for n x n matrices A with entries in any 
commutative ring R. 

A better way to view the determinant is to consider it as a function 
D = D n : Mat,, (ft) ft. 

We now axiomatize some desirable properties of D, prove that these properties 
characterize D and, finally, prove that such a function D exists. Regard an n x n 
matrix A = [o,y], not as n 2 entries, but rather as the list a \, . . . , a n of its rows, 
where a,- = (an , . . . , a ln ) e R" (here, we view R ", the set of all /(-tuples with 
entries in R, as an additive abelian group). Given any function of n variables, we 
can construct functions of one variable by fixing n — 1 variables. In more detail, 
given a\, ... ,a n , there are functions d, : ft ” — ft, one for each i , defined by 

di( jfi) = D(a i, . . . , a,_i, /6, a i+ \, .... ot n )\ 

of course, the notation dj is too abbreviated: it depends on D and on the list of 
rows a\, ... ,oti, ... , a„. 
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Definition. Let R be a commutative ring. An n x n determinant function is a 
function D : Mat,, (R) — »■ R with the following properties: 

(i) D is alternating. D( A ) = D(a \ , . . . , a n ) = 0 if two rows of A are equal; 

(ii) D is multilinear: for each list u\ , ... ,a n , the functions d ,• : R" -* R, 

given by dff) = D(a i i, >8, a,-+l a„), satisfy 

<i< (yS + y) = d,- (yd) + di (y ) and d t (cf) = cdj (ft) 
for all c e R : 

(iii) D(e\, . . . , e n ) = 1, where e\, . . . , e n is the standard basis; that is, if I is 
the identity matrix, then D(I) = 1. 

One can prove that any determinant function D satisfies the following prop- 
erties: 


D(AB) = D{A)D{B) (1) 

and 

D(A t ) = D{A), (2) 

where A 7 is the transpose of A. Moreover, it can be shown that D(A) must equal 
the complete expansion, and so D is unique, if it exists. Let us be more precise. 
For each n > 1, there is at most one determinant function D : Mat,, (R) — »• R. 

The rather long proof of existence of a determinant function Mat,, ( R ) — »■ R 
is by induction on n > 1 (see Curtis, Linear Algebra, for example, which con- 
siders the special case when R is a field. There is a proof for arbitrary commu- 
tative rings, using exterior algebra, in my book. Advanced Modern Algebra). If 
A = [on] is a 1 x 1 matrix, define det(A) = a\\. For the inductive step, as- 
sume that there exists a (necessarily unique) determinant function defined on all 
(n — 1) x (n — 1) matrices over R. Define, for any fixed i, 

det(A) = £(-l y +j aij det (Ay), (3) 

j 

where Ajj denotes the ( n — 1) x (n — 1) matrix obtained from A by deleting its 
ith row and / th column. 

Definition. Formula (3) is called the Laplace expansion of det(A) across the 
ith row. 

Since, for any i, Laplace expansion across the ith row is a determinant func- 
tion, uniqueness implies that the determinant can be computed using Laplace ex- 
pansion across any row. Moreover, Eq. (2), d(A) = d(A T ), implies that det(A) 
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can be computed by Laplace expansion down any column (for transposing inter- 
changes rows and columns). One of the key virtues of Laplace expansion is that 
it is amenable to inductive proofs. For example. Exercise 4.44 on page 398 states 
that if A = [ ctjj ] is a triangular matrix (either all entries below the diagonal are 0 
or all entries above the diagonal are 0), then det(A) = a \ \ a-ri ■ ■ ■ a nn . the product 
of the diagonal entries. 

When k is a field, there is an efficient way of computing det(A), using ele- 
mentary row operations A — > A' that change a matrix A into a matrix A': 

Type I : add a scalar multiple of one row of A to another row; 

Type II : multiply one row of A by a nonzero c e k\ 

Type III : interchange two rows of A. 

Recall that Gaussian elimination is the process of changing one matrix into an- 
other by repeated application of elementary row operations. 

If A — > A ' is an elementary row operation, then det(A') = r det(A) for 
some r e k: if the operation is of type II, then the multilinearity in the definition 
of a determinant function shows that r = c; Exercise 4.40 shows that r = 1 
for an elementary operation of type I; Exercise 4.42 shows that r = — 1 for an 
elementary operation of type III. When k is a field, one can put A into triangular 
form by Gaussian elimination: there is a sequence of elementary row operations 

A -> Ai -> A 2 -> > A q = A, 

where A is triangular. Therefore, we can compute det(A) in terms of det(A) and 
this sequence of operations, for Exercise 4.44 on page 398 shows that det(A) is 
the product of its diagonal entries. 

Let us return to matrices with entries in an arbitrary commutative ring R. 
We now generalize the definition of an elementary operation of Type II so that it 
multiplies one row of A by a unit c e k. 

We say that an n x n matrix A over a commutative ring R is invertible if 
there exists a matrix B (with entries in R) with AB = I = BA. (When R is a 
field, invertible is usually called nonsingular.) 

Definition. Let A = [ajj] be an n x n matrix with entries in a commutative 
ring R. Then the adjoint 8 of A is the matrix 

adj(A) = [cal 

where 

cp = (-l) i+j det (Ay) 

8 There is another notion of adjoint, unrelated to this notion, on page 371. 
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and Ajj denotes the (n — 1) x (n — 1) matrix obtained from A by deleting its /th 
row and j th column. 

The reversing of indices is deliberate. In words, adj ( A ) is the transpose of 
the matrix whose i j entry is (— 1) ,+] det(A,y). We often call Cjj the i j -cofactor 
of A. 

Proposition 4 . 86 . If A is an n x n matrix with en tries in a commutative ring 
R, then 

Aadj(A) = det(A)/ = adj(A)A. 

Proof. If A = [aij], let us write (A) ; y = a- t j in this proof. Thus, if C = [c ( / ], 
then (AC)ij = J2k a ikCkj- If we now define C = [c (/ ] by c// = (— l)' +y det(Ay/) 
[so that C = adj (A)], then Laplace expansion across the ith row of A gives 

(AC)ii = ^(-D'+WdeKAa.) = det(A). 

k 

We pause a moment before computing (AC)ij for j f i. Define M = [in pq \ 
to be the matrix obtained from A by replacing its /th row (aji, , aj n ) by its 
/th row (an , . . . , thus, mjk = an for all k. Note that = A // c for 
all k (because M and A differ only in the /th row, which is deleted to obtain the 
smaller matrices Mjk and Ajk). 

When j f i, 

(AC)ij = J]a;*(-l) i+ *det( A jk ) 

k 

= ^(-D'+^m^detCM^.) 

k 

= det (M), 

because an = mjk and A //, = Mjk- But det(M) = 0, because two of its rows 
are equal. Therefore, A adj (A) = AC is the scalar matrix have diagonal entries 
all equal to det(A). • 

Corollary 4 . 87 . If A is an n x n matrix with entries in a commutative ring R, 
then A is invertible if and only ifdet(A) is a unit in R. Moreover, 

Proof. If A is invertible, tlic/fMci'rc i^ iTn^fiVi^ h with AB = I . By Eq. (1), 1 = 
det(7) = det(AB) = det(A) det( 6), so that det(A) is a unit in R. Conversely, 
assume that det(A) is a unit in R. If B = det(A) _1 adj(A), then Proposition 4.86 
shows that AB = I = BA. 

Now / = AA -1 gives 1 = det(7) = detfAA” 1 ) = det(A) det(A _1 ). There- 
fore, det(A _1 ) = det(A) -1 . • 
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In the special case when R is a field, we see that A is invertible (i.e., A is 
nonsingular) if and only if det(A) 7 ^ 0 (a familiar result from a standard linear 
algebra course). On the other hand, if A is an n x n matrix with entries in Z, then 
A is invertible if and only if det(A) = ±1; that is, A -1 has only integer entries 
if and only if det(A) = ±1. If R = &[x], where k is a field, then A is invertible 
if and only if its determinant is a nonzero constant. 


Proposition 4.88. Let P and M be n x n matrices with entries in a commutative 
ring R. If P is invertible, then 

det (PMP~ l ) = det (Af). 

Proof. We have det(.P -1 ) = det(P) -1 ,by Corollary 4.87. Since determinants 
lie in the commutative ring R, 

d et(PMP~ l ) = det(P) det(M) det(P -1 ) 

= det(M)det(P)det(P“ 1 ) = det(M). • 


Corollary 4.89. Let T : V — »■ V be a linear transformation on a vector space V 
over a field k, and let X and Y be bases ofV. If A = x\T]x an d B = r[T]y, 
then det(A) = det(B). 

Proof By Corollary 4.73, A and B are similar; that is, there is a nonsingular 
(hence, invertible) matrix P with B = PAP~ l . • 

It follows from this corollary that every matrix associated to a linear transfor- 
mation T has the same determinant, and so we can now define the determinant 
of a linear transformation. 

Definition. If T : V — > V is a linear transformation on a finite-dimensional 
vector space V, then 

det(jT) = det(A), 

where A = x\T]x f° r some basis X of V. 

As we have just remarked, this definition does not depend on the choice of X. 
Perhaps the simplest linear transformations T : V V are the scalar trans- 
formations T = cl y; that is, there is a scalar c e k such that T (v) = cv for all 
v e V. We now ask, for an arbitrary linear transformation T : V — > V, whether 
there exists c e k with T (u) = cv for some v e V (of course, this can only be 
interesting if v 0 ). 
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Definition. Let T : V — »■ V be a linear transformation, where V is a finite- 
dimensional vector space over a field k. A scalar c e k is called an eigenvalue 9 
of T if there exists a nonzero vector v, called an eigenvector, with 

T ( v ) = cv. 


Proposition 4.90. Let T : V -a- V be a linear transformation, where V is a 
vector space over a field k, and let c\, . . . , c r be distinct eigenvalues of T lying 
in k. If v,- is an eigenvector of T for Ci, then the list X = v\, ..., v r is linearly 
independent. 

Proof. We use induction on r > 1. The base step r = Ms true, for any 
nonzero vector is a linearly independent list of length 1 , and eigenvectors are, by 
definition, nonzero. For the inductive step, assume that 


a\V\ H 1- a r+ iiv+l = 0. 

Applying T to this equation gives 


a\c\V\ H La r+1 c r+ iiV+i = 0. 


Multiply the first equation by c r + 1 , and then subtract from the second to obtain 

a\{c\ - c r+ iM H h a,- (c r - c r+ i)v r = 0. 

By the inductive hypothesis, o,(c,- — c,+i) = 0 for all i < r. Since all the 
eigenvalues are distinct, c; — c,-+i f 0, and so a, = 0 for all i < r. The original 
equation now reads a r+ \v r +i = 0, and so a r + 1 = 0, by the base step. Thus, all 
the coefficients a,- are zero, and vi, ... , v r+ \ is linearly independent. • 

If T : V — > V is a linear transformation, then so is cly — T for any c £ k. 
Thus, if v is an eigenvector of T, then (cly — T)(v) = 0; that is, cly — T 
is singular. Conversely, if cly — T is singular, then Corollary 4.80 provides a 
nonzero vector v with (cly — T){v) = 0; that is, T( v) = cv. We have been 
led to linear transformations of the form cl — T for scalar c e k, and this leads 
us to consider matrices xl — A, where A is a matrix representing T . Since we 
have been treating matrices with entries in commutative rings, it is legitimate for 
us to compute the determinant of xl — A, a matrix whose entries lie in k[x]. Of 
course, det(x/ — A) is a polynomial in k[x \. 

9 The word eigenvalue is a partial translation of the original German word Eigenwert ( Wert 
means value). A translation of eigen is characteristic or proper, and one often sees char- 
acteristic value used instead of eigenvalue. This partial translation also explains the terms 
eigenvector and characteristic vector. 



Determinants 391 


Definition. If A is an n x n matrix with entries in a field k, then its characteristic 
polynomial 10 is 

HaO c) = det(x7 — A). 


If R is a commutative ring and A is an n x n matrix with entries in R, 
then det(A) e R. In particular, the entries of xl — A lie in k[x], and so 
hj(x) = dctU / — T) e k[x\, that is, the characteristic polynomial really is 
a polynomial. Every eigenvalue of T is a root of h j (x), but there may be roots 
of the characteristic polynomial that do not lie in k. For example, [ _j q ] has no 
real eigenvalues (its characteristic polynomial is x~ + 1). Almost everyone ex- 
tends the definition of eigenvalue to include such roots (of course, nothing new 
occurs if all the roots of hj{x) lie in k). 

The next proposition will enable us to speak of the characteristic polynomial 
of a linear transformation. 


Proposition 4.91. If A and B are similar n x n matrices with entries in afield k, 
then 

det {xl — A) = det(x7 — 71), 

and so similar matrices have the same eigenvalues ( occurring with the same 
multiplicities). 

Proof. If B = PAP~ X , then 

P(xl - A)P~' = PxlP~ l - PAP~' = xl - B. 

Therefore, det (P(xl — AIT’" 1 ) = dctU / — B). But 

det(P(xI - A)P~ X ) = det(P) det(x7 - A) det(P _1 ) = det(x7 - A). • 

Thus, it makes sense to define det(x7 — T) to be det(x7 — A), where A is 
any matrix representing T . 


Definition. If T : V — »■ V is a linear transformation on a finite-dimensional 
vector space V over a field k, then an eigenvalue of T is a root of its characteristic 
polynomial hj(x). 


10 No one calls the characteristic polynomial the eigenpolynomial. 
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Remark. Regard A = [ _j ( * ] as a matrix over M. Its characteristic polynomial 
1ia(x ) = x 1 + 1, and its eigenvalues are ±i. Since these eigenvalues do not lie in 
M. there is no eigenvector in M 2 for either of them; there do not exist real numbers 
a and b with [ _°j q ] [ £ ] = ['•?]■ However, if we regard A as a complex matrix 
(whose entries happen to be real), then we can find eigenvectors; for example, 
(1, i) T is an eigenvector. 

More generally, let A be an n x n matrix over a held k. and let 7' : k" — »• k" 
be the linear transformation T(x) = Ax, where x £ k n is a column vector. By 
Kronecker’s theorem (Theorem 3.1 18), there is an extension held K/k contain- 
ing all the roots of ItaIx)', that is, K contains all the eigenvalues of A. Now 
T : K" — > K ", dehned by T (x) = Ax, is a linear transformation, where x £ K n 
is a column vector. Our original discussion of eigenvalues shows that if c £ K is 
an eigenvalue of A, then there is an eigenvector v £ K " with T (v) = cv. ◄ 

Proposition 4.92. IfT.V^V is a linear transformation, where dim(V) = n, 
then T has at most n eigenvalues. 

Proof. If dim(V) = n, then deg (7/ j ) = n, and the result follows from Theo- 
rem 3.50. • 

Definition. If A = [a t j \ is an n x n matrix, then its trace is the sum of its 
diagonal entries: 

n 

tr(A) = Ya it . 
i = 0 

Proposition 4.93. Let k be a field, and let A be an n x n matrix with entries 
in k. Then h a(x) is a monic polynomial of degree n. Moreover, the coefficient of 
x n ~ l in h a(x) is — tr(A) and the constant term is (—1)" det(A). 

Proof. Let A = [«, 7 1 and let B = xl — A; thus, B = [h (/ 1 = [xSjj — atj] 
(where 8jj is the Kronccker delta). The complete expansion is 

det(fi) = ^ ' ssn(cr )h CT (i) ,\ba(2),2 ' ' 'b o(n),n • 

CfCiS , 1 

If o is the identity (1) e S n , then the corresponding term in the complete expan- 
sion of B = x I — A is 

{x - a n)(x - 022 ) •••(*- a nn ) = ]~[(x - an), 

i 

a monic polynomial in k[x] of degree n. If o (1), then the a th term in 
the complete expansion cannot have exactly n — 1 factors from the diagonal of 
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xl — A, for if a fixes n — 1 indices, then a = (1). Therefore, the sum of the 
terms over all a (1) is either 0 or a polynomial in k\x | of degree at most n — 2 , 
so that deg (/? ^ ) = n. By Exercise 3.99 on page 305, the coefficient of x" _l is 
— an = — tr(A), and the constant term of Iia(x) is /? a ( 0) = det(— A) = 
(— 1)" det(A). • 


Corollary 4.94. If A and B are similar n x n matrices with entries in afield k, 
then A and B have the same trace and the same determinant. 

Proof. Now A and B have the same characteristic polynomial, by Proposi- 
tion 4.91, and so Proposition 4.93 applies to give tr(A) = tr (B) and det(A) = 
det(fi). • 

Let A and B be similar matrices. Another proof that tr(A) = tr( B) is de- 
scribed in Exercise 4.50 on page 399, while Proposition 4.88 shows that det(A) = 
det(B). 

The next corollary interprets the trace as the sum (with multiplicities) of 
the eigenvalues and the determinant as the product (with multiplicities) of the 
eigenvalues. 

Corollary 4.95. Let A = [a/j] be an n x n matrix with entries in afield k, and 
let Iia(x) = |~[" = ] (x — o'/) be its characteristic polynomial. Then 

tr (A) = E oti and det(A) = n oti. 

i i 

Proof. By Exercise 3.99 on page 305, if f(x) = j cjx 1 e k[x] is a monic 
polynomial with f(x) = J^/Li ( x ~ «/), then c n -\ = — ff ( a j and c o = 
(-D" n, otj. This is true, in particular, for f(x) = h 4 (x). The result now 
follows from Proposition 4.93, which identifies c„_i with — tr(A) and cq with 
(—1)" det(A) (in each case, the sign cancels). • 


Example 4.96. 

Consider A = [ ] as a matrix in Mat2(Q). Its characteristic polynomial is 
Iia(x) = det 

The eigenvalues ^ (5 ± \/33) of A can be found by the quadratic formula. Note 
that 


x — 1 
-3 


-2 
x -4 


= x 


— 5x — 2. 


— tr( A) = — 5 (5 + V33) + 5 (5 — V33) =5; 
det(A) = ^(5 + v / 33)^(5-v / 33) = -2. < 



394 


Linear Algebra Ch. 4 


If scalar matrices are the simplest matrices, then diagonal matrices are the 
next simplest, where an n x n matrix D = [ dp ] is diagonal if all its off-diagonal 
entries d t j = 0 for i f j . 


Definition. An n x n matrix A is diagonalizable if A is similar to a diagonal 
matrix. 

Of course, every diagonal matrix is diagonalizable. 


Proposition 4.97. 

(i) An n x n matrix A over afield k is diagonalizable if and only if there is a 
basis ofk n comprised of eigenvectors of A. 

(ii) If A is similar to a diagonal matrix D, then the diagonal entries of D are 
the eigenvalues of A ( with the same multiplicities). 

Proof 

(i) As usual, define a linear transformation T : k n —> k" by T(v) = Av. If A 
is similar to a diagonal matrix D = [dp], then there is a basis X = v\, . . .,v n 
of k" with D = x[T]x'> that is, T(vj) = d\jv\ + • • • + d n jV n . Since D is 
diagonal, however, we have T (vj) = djjVj, and so the basis X is comprised of 
eigenvectors. (All the Vj are nonzero, for 0 is never a part of a basis.) 

Conversely, let A = v \, . . . , v n be a basis of k n be comprised of eigen- 
vectors; say, T(vj) = cjVj for all j . The /ill column of B = x\L\x is 
[0, . . . , 0, Cj, 0 0] r , for all j, and so B is a diagonal matrix with diago- 

nal entries c\, ... ,c n . Finally, A and B are similar, for both represent the linear 
transformation T relative to different bases. 

(ii) If D is a diagonal matrix with diagonal entries da, then defi.v / — D) = 
]”[, (•*: — da), and so the eigenvalues of D are its diagonal entries. Since A and 
D are similar, Proposition 4.91 shows that they have the same eigenvalues (with 
the same multiplicities). • 


Corollary 4.98. Let A be an n x n matrix over afield k which contains all the 
eigenvalues of A. If the characteristic polynomial of A has no repeated roots, 
then A is diagonalizable. 

Proof. Since dcg(/z f) = n, the hypothesis gives n distinct eigenvalues c\, ... ,c n 
As these eigenvalues all lie in k, there are corresponding eigenvectors v \ , . . . , v„ 
in k n \ that is, Avj = CiVi. By Proposition 4.90, the list v\, . . ,,v n is linearly 
independent, and hence it is a basis of k n . The result now follows from Proposi- 
tion 4.97. • 
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The converse of Corollary 4.98 is false. For example, the 2x2 identity 
matrix / is obviously diagonalizable (it is actually diagonal), yet its characteristic 
polynomial x 2 — 2x + 1 has repeated roots. 


Example 4.99. 

Let A = [£ b c ] be a real (symmetric) matrix. We claim that the eigenvalues of A 
are real. Now 1ia(x ) = x 2 — (a + c)x + ( ac — b 2 ). By the quadratic formula, 
the eigenvalues are 

x = 2 [(a + c) ± -J (a + c) 2 — 4(oc — fi 2 )]. 

But (a + c) 2 — 4(oc — b 2 ) = (a — c) 2 + 4 b 2 > 0, and so its square root is real. 
Thus, the eigenvalues x are real. ◄ 

It is not clear how to generalize the argument in Example 4.99 for n > 3, 
but the result is true: the eigenvalues of a symmetric real matrix are real. The 
theorem is called the principal axis theorem because of an application of it to 
find normal forms for (higher dimensional) conic sections. 

Recall Example 4.66: if T : W — »■ M" is a linear transformation, then its 
adjoint is the linear transformation T* : M” M" such that 

(Tu, v ) = (n, T*v ) 

for all u , v e M" , where (u , v) is the usual inner product. This example also 
shows that if A = £[T] £ , then e[T*]e = A T ■ Hence, if A is a symmetric 
matrix, then T = T*. 

Since eigenvalues of a real polynomial may be complex, we begin by ex- 
tending the inner product on W (almost) to an inner product onC”. 

Definition. If u = (a\, . . . , a n ), v = (c i, , c„) e C”, define 

(u, v) = a\c\ 4 1- a n c n . 


It is easy to check that (u + u', v ) = (u, v) + ( u v) and that (, qu , v) = 
q(u, v), where u, u! , v € C" and q e C. However, this is not an inner product 
because it is not symmetric. Instead of (v, u ) = (u, n), we now have (v, u) = 
(a , v). It follows that 

(m, qv) = q(u , v). 

Theorem 4.100 (Principal Axis Theorem). If A is an n x n real symmetric 
matrix, then all its eigenvalues are real, and A is diagonalizable. 
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Proof. Define T : C" — »■ C” by T ( u ) = Au. Since A is a symmetric matrix, 
we see that T = T*\ that is, for all u, ueC", 

(T u,v) = (u, T v). 

We first prove that all the eigenvalues of A (and T) are real. Let c be an eigen- 
value of T, so that there is a nonzero v e C" with Tv = cv (these exist because 
C is algebraically closed). We evaluate ( T v, v) in two ways. On the one hand, 
(Tv, v) = (cv, v) = c(v, v). On the other hand, since T = T*, we have 
(T v, v ) = (v, Tv) = (v, cv) = c(v, v). Now (v, v) 0 because v 0, so that 
c = c; that is, c is real. 

We prove that A is diagonalizable by induction on n > 1. Since the base 
step n = 1 is obvious, we proceed to the inductive step. Choose an eigenvalue 
c of A; since c is real, there is an eigenvector v e W with Av = cv. Note that 
M" = (v) © (w) 2 -, by Exercise 4.20 on page 345, so that dim((u)- L ) = n — 1. We 
claim that r((u)- L ) C (d)- 1 . If w e ( v )~ L , then (w, v) = 0; we must show that 
(T w, v) = 0. Now (T w, v) = (w, T*v) = (w, T v), since A is symmetric. But 
(w, Tv) = (w, cv) = c(w, v) = 0, so that Tw e (u) -1 , as desired. If T' is the 
restriction of T to (v) 2 -, then T' : ( v )-*- — »• (w) 2- . Since (Tu, w) = (u, Tw) for 
all u, w el", we have the equation (T' u. w) = (u, T'w), in particular, for all 
u,w e (u) J . Therefore, the inductive hypothesis applies, and T is diagonaliz- 
able. Proposition 4.97 says that there is a basis vi, ■ ■ ■ , v n of (v) 1 - consisting of 
eigenvectors of T', hence of T . But v, V 2 , ■ ■ ■ , v n is abasis of R” = (v) © (v) -1 , 
and so A is diagonalizable (using Proposition 4.97 again). • 

Example 4.101. 

Here is an example of an n x n matrix that is not diagonalizable. View R x /,, a 
rotation about the origin by xj/ (where i// f 0° and i// f 1 80°), as a function 
C —>■ C (instead of as a function M 2 -»■ M 2 ). Now R ^ rotates every line i = 
{re ,e : r e 1} to the new line : r e 1). But if v = e' e is an eigenvalue 

of R^, then R,p would carry the line i = [re l6 : r e M) into itself. <4 

To understand why some matrices are not diagonalizable, it is best to con- 
sider the more general question when two arbitrary n x n matrices are similar. 
If V is a finite-dimensional vector space over a field k, a linear transforma- 
tion T-.V^V yields many matrices, all of the form x[T] x for some basis 
X = vi , . . . , v n of V. By Corollary 4.73, if A and B are two such matrices, say, 
A = x[T]x an d B = y[T]y (where X and Y are bases of V), then A and B are 
similar', that is, there is a nonsingular matrix P = y[ly]x with B = PAP~ [ . 
The basic idea is, given a matrix A, to find a “simplest” matrix C similar to A; 
such a matrix C will be called a canonical form for A. One’s first candidate 
for a canonical form is a diagonal matrix, but Example 4.101 says this is not 
adequate. It turns out that every matrix has two canonical forms: its rational 
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canonical form and its Jordan canonical form. The first type is built from com- 
panion matrices (see Exercise 4.53 on page 399); the second type is built from 
Jordan blocks, which are matrices of the form cl + L, where L consists of all 
0’s except for l’s just above the diagonal. 

Here are some theorems proved using canonical forms. 

Theorem. Let A and B be n x n matrices over a field k and let K/k be afield 
extension. If A and B are similar over K, then they are similar over k. That 
is, if there is a nonsingular matrix P over K with PAP~ l = B, then there is a 
nonsingular matrix Q over k with QAQ~ l = B. 

For example, two real matrices that are similar over C must be similar over 
the reals. 

Theorem. Every n x n matrix is similar to its transpose. 

Theorem (Cayley-Hamilton). Let A be an n x n matrix with characteristic 
polynomial h a(x) = co + cix + cix 2 + • • • + x". Then 

col + c\A + c 2 A 2 + ■ ■ • + A n = 0. 

There are proofs of the Cayley-Hamilton theorem without using canonical 
forms (see, for example, Birkhoff-Mac Lane), but I prefer a proof that uses them. 


Exercises 

*4.40 Let R be a commutative ring, let D : Mat,, ( R) -*■ R be a determinant function, 
and let A be an n x n matrix with rows a\, ... ,a n . Defi ne d : R " — »• R by 
di IP) = D(c/j a,_i, fil, ati+ 1 , . . . , a„). 

(i) If i j and r e R, prove that 

d, (rct j) = 0. 

(ii) If i j and r e R, prove that dfcti + raj) = D(A). 

(iii) If r j € R, prove that 

di (a, + J ^rjotj) = D(/\ ). 
ii=i 

4.41 If O is an orthogonal matrix, prove that dct(O) = ±1. 

*4.42 If A' is obtained from an n x n matrix by interchanging two of its rows, prove that 
det(A') = — det(A). 
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4.43 If A is an n x n matrix over a commutative ring R and if r e R, prove that 
det(rA) = r" det(A). In particular, det(— A) = (—1)" det(A). 

*4.44 If A = [ciij ] is an n x n triangular matrix, prove that 

det(A) = a\\a 22 ■ ■ ■ a n „. 


*4.45 If u o, . . . , u„ is a list in a fi eld k, then the corresponding Vandermonde matrix is 
the (n + 1) x (« + 1) matrix 


(i) 


(M) 




"l 

M 0 

«0 




1 

Ml 

Mj 

u 

V = V (u 0 , . . 

• 9 U n — l) 

1 

M 2 

u\ 

u 


it 


n 

0 


U 


n 

1 


U 


n 

2 


L 1 u n U 3 n ••• U n n 

Prove that 

det(V) = Y[( U J ~ u i )• 
i<j 


Conclude that V is nonsingular if all the w,- are distinct. 

If co is a primitive nth root of unity (co" = 1 and a>‘ 1 for i < n), prove 

that V (1, a>, co (i) 2 , ... , co" -1 ) is nonsingular and that 

V(l, co, w 2 , . . . , w" -1 ) -1 = ^y(l,co^‘,co^ 2 , ...,co^" +1 ). 


(iii) Let f(x) = czo + a\x + a 2 X 2 + • • • + a„x n G k\x\, and let >>; = /(«,). 
Prove that the coeffi cient vector a = (c<), . . . , a„) is a solution of the 
linear system 


Vx = y, (4) 

where y = (yo, . . . , y„). Conclude that if all the m, are distinct, then 
f(x) is determined by Eq. (4). 

4.46 Dell ne a tridiagonal matrix to be an n x n matrix of the form 


X] 

1 

0 

0 

0 

0 

0 

0" 

-1 

X 2 

1 

0 

0 

0 

0 

0 

0 

-1 

X 3 

1 

0 

0 

0 

0 

0 

0 

-1 

X\ 

0 

0 

0 

0 

0 

0 

0 

0 

' * ' X n — 3 

1 

0 

0 

0 

0 

0 

0 

-1 

Xn —2 

1 

0 

0 

0 

0 

0 

0 

-1 

•*71—1 

1 

0 

0 

0 

0 

0 

0 

-1 

X n _ 


(i) If D„ = det(L[xi, . . . , x„]), prove that D\ = xi, Dj = X 1 X 2 + 1, and. 

for all n > 2, 


D n — x n D n —i D n ~ 2 . 
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(ii) Prove that if all x,- = 1, then D n = F n+ i , the nth Fibonacci number. 
(Recall that Fo = 0, F\ = 1, and F n = F n -\ + F n - 2 for all n > 2.) 

4.47 Let k be a fi eld and let k* be its multiplicative group of nonzero elements. Prove 
that det: GL(n, k) -a- k x is a surjective group homomorphism whose kernel is 
SL(n, k). Conclude that GL(n, k)/SL(n, k) = k x . 

4.48 Let R be a commutative ring, let A be an n x n with entries in R, let B be an 
m x m matrix, and let D be the in + m) x in + m) matrix D = [ n g ]• Prove that 
det(D) = det(A) det(R). 

*4.49 If A is an m x n matrix over a fi eld k, prove that rank(A) > d if and only if A has 
a nonsingular d x d submatrix. Conclude that rank(A) is the maximum such d. 

*4.50 (i) If A and B are n x n matrices with entries in a commutative ring R, prove 

that tr (AB) = tr {BA). 

(ii) Using part (i) of this exercise, give another proof of Corollary 4.94: if A 
and B are similar matrices with entries in a fi eld k, then tr(A) = tr(R). 

*4.51 If k is a fi eld and A is an n x n matrix with entries in k, then we saw, in Exer- 
cise 3.38 on page 248, that the map cp : k\x | — > k[A\, defi ned by f(x) (-> /(A), is 
a surjective ring homomorphism. 

(i) Prove that ker cp = (m a)- Conclude that k[A] = k[x\/(mA)- 

(ii) If k is an algebraically closed fi eld, proved that the following statements 
are equivalent: 

1. A'[A] is a fi eld; 

2. k[A\ is a domain; 

3. A is a scalar matrix. 

*4.52 Let Ai, . . . , A t be square matrices with entries in a commutative ring R. Prove 
that det(Ai 0 • • • © A t ) = det(Ai ) • • • det(A r ), where the direct sum of an n x n 
matrix A and an m x m matrix B is defi ned to be the ( m + n) x (m + n) matrix 


A© B 


A 0 
0 B 


*4.53 If g(x) = x + co, then its companion matrix C(g ) is the 1 x 1 matrix [— <?o]; if 
s > 2 and g(x) = X s + 1 + • • • + c\x + co, then its companion matrix 

C(g) is the s x s matrix 


-0 0 0 • • • 0 -c 0 ~ 

1 0 0 ••• 0 -ci 

0 1 0 • • • 0 -c 2 

0 0 1 • • • 0 — c 3 


L0 0 0 ••• 1 — Cy_l J 


If C = C(g) is the companion matrix of g(x ) G k\x\, prove that the characteristic 
polynomial hc(x ) = det(x/ — C) = g{x). 

*4.54 If Ai, . . . , A, and B \, . . . , B t are square matrices with A,- similar to S, for all i, 
prove that Ai © • • • © A t is similar to Si © • • • © B t . (The direct sum of matrices 
is defi ned in Exercise 4.52.) 
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4.55 Prove that an n x n matrix A with entries in a fi eld k is singular if and only if 0 is 
an eigenvalue of A. 

4.56 Let A be an n x n matrix over a fi eld k. If c is an eigenvalue of A, prove, for all 
m > 1, that c m is an eigenvalue of A'" . 

4.57 Find all possible eigenvalues of n x n matrices A over R. for which A and A 2 are 
similar. 

4.58 An n x n matrix N is called nilpotent if N m = 0 for some m > 1. Prove that all 
the eigenvalues of a nilpotent matrix are 0. (The converse is also true, but it uses 
the Cayley-Hamilton theorem: if all the eigenvalues of a matrix A are 0, then A is 
nilpotent.) 

4.59 If A is a nilpotent matrix, prove that 1 + N is nonsingular. 


4.5 Codes 

When we discussed codes earlier, in Example 1.73, our emphasis was on se- 
curity: how can we guarantee that a message not be decoded by unauthorized 
readers? We now leave the world of spies in order to consider the accuracy of a 
received message. Suppose that Pat asks Mike for Ella’s phone number, but that a 
dog barks as Mike answers. Because of this noise, Pat isn’t sure whether he heard 
the number correctly, and he asks Mike to repeat the number. Most likely, one or 
two repetitions will ensure that Pat will have Ella’s number. But simply repeating 
a message several times may not be practical. A better paradigm involves sci- 
entists on Earth wanting to see photographs sent from Mars or Saturn. In 2004, 
robotic cameras sent to these planets encoded each photograph as a bitstring in 
the following way. A photograph is divided into 1024 x 1024 pixels pij (these are 
the numbers actually used); thus, there are 2 10 x 2 10 = 2 20 = 1, 048, 576 pix- 
els. Each pixel, pij, is equipped with a 12-digit binary number c\j describing its 
color, intensity, etc. The 2 10 x 2 10 matrix [c/y] is then written as a single bitstring: 
row 1, row 2, . . ., row 1024. In this way, one photograph is converted into a mes- 
sage having roughly 12,000,000 bits. This binary number must be transmitted 
across space, and space is “noisy” because cosmic rays interfere with electronic 
signals. Clearly, it is not economical to send such a long message across space 
several times and, even if it were sent repeatedly, it is most likely that no two 
of the received messages would be identical. Still, as we saw with Ella’s phone 
number, it is natural to repeat, and redundancy is the key to accurate reception. 
We seek some practical ways of encoding messages efficiently so that mistakes 
in the received message can be detected and, even better, corrected. This is what 
is done to enable us to see reasonably faithful photographs sent from the planets. 


Block Codes 
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Sending a message over a noisy channel involves three steps: encoding the mes- 
sage (incorporating some redundancy); transmitting the message; decoding the 
received message. This mathematical study of codes began, in the 1940s, with 
the work of C. E. Shannon, R. W. Hamming, and M. J. E. Golay. 

Notation. We will usually denote the finite field F 2 by B in this section. 


Definition. Call a finite set A an alphabet, and call its elements letters. If m 
and n are positive integers, then an encoding function is an injective function 
E : A m A' 1 . The elements w = (a\, ... , a m ) e A" are called words, the 
set C = im£ C A" is called an [n, m ]-block code 1 1 over A, and the elements 
of C are called codewords. (If A = IE, then an [n, m] -block code is called a 
binary code.) A function T : A" — »• A" is called a transmission function, and a 
function D : A" — > A" 1 is called a decoding function. 

Since encoding involves redundancy, it is usually the case that m < n. The 
choice of m is not restrictive, for any long message can be subdivided into shorter 
subwords of lengths < m . A transmission function T may be sending a photo- 
graph from outer space to Earth. Of course, we want to read the original message. 
Were there no noise in the transmission, then any codeword c = E(w) could be 
decoded as w = E~ 1 (c), for encoding functions are injective. Since errors may 
be introduced, however, the task is to equip a code with sufficient redundancy so 
that one can recapture a codeword from its transmitted version; that is, we want 
D o T o E = 1^4»i . 


Example 4.102. 


(i) Parity Check [m + 1, 7n]-Code 

Define an encoding function E: W n — »■ B"' +1 as follows: if w = 
(«!,..., a m ) e B m , then 

E(w) = {a 1 a m , b), 

where b = YlT=l a ‘- ^ ' s clear that E is injective, and it is easy to check 
that the code C = im E C B m+1 is given by 

C = {(b 1 b m+ 1 ) eB m+1 :bi + --- + b m+ i =0}. 

1 'This is called a block code because all code words have the same length, namely, n \ it is 
not diffi cult to modify block codes to allow words of varying length. 
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If w e B"‘ , then the parity of £ (in) is even in the sense that the sum of its 
coordinates in B is 0. If one receives a message of odd parity (the coordi- 
nate sum is 1 ), then one knows that there has been a mistake in transmis- 
sion. Thus, this code can detect the presence of single errors; however, a 
double error in the received message cannot be detected because the parity 
is unchanged if, for example, two Os are replaced by two Is. 

(ii) Triple Repetition [12,4]-Code 

Consider the encoding function E : B 4 — > B 12 defined by E ( w ) = 
( w , w, in); that is, C consists of all words of the form 


E(a i, a2, a 3, 124) = (a i, <22, <23, <24, <21, a 2 , <23, <24, fli, <22, <23, 04). 

Now let T : IB 12 — > B 12 be a transmission function, and let the received 
message be (n, . . . , H 2 )- Because of interference, it is possible that 

TE(a\, a2, <23, <24) = (n, • • • > r 12) 7^ £(<21, 02, <23, <24). 

Define a decoding function D : B 12 -> B 4 by 

D(r 1, . . . , rn) = (b\,b2, bj,, b 4 ) 


as follows. If there were no errors in transmission, then r\ = r$ = n> . 
In any event, since there are only two possible values for each r,-, at least 
two of these must be the same; define b\ to be this value. Similarly, define 
b 2 ,bi, and £ 4 . (Messages must be decoded, and so we do not consider a 
double repetition [ 8 , 4]-code, for there is no natural candidate for decoding 
(n, • • • , i's) if, say, ri 7 ^ r$.) Notice that this coding scheme can detect 
eiTors: if r\, r ;+ 4 , /-;+ g are not all equal, then there has been an error (alas, 
a really bad error may not be detected). Indeed, this code can correct an 
error in the sense that the decoding function replaces a mistake. 

(iii) Two-dimensional Parity [9, 4] -Code 

Let us write a word w = (r\, ... ,rg) e B y as a 3 x 3 matrix 


n 

r 2 

n 

rA 

r 5 

r 6 

n 

n 

r 9 


Consider the encoding function £ : 


defined by 



a 

b 



a 

b 

a +b 

£(a, b, c, d) = 

c 

d 

re 

= 

c 

d 

c + d 


n 

n 

rg 


a + c 

b + d 

0 -\- b T c T d 


Thus, r 3 and 17 , are parity checks for the first two rows, r-j and r% are parity 
checks for the first two columns, and rg is the parity check for r-j and rg 
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as well as for rj and r We have constructed a [9, 4] -code C = im£ 
consisting of all 3 x 3 matrices over B whose rows and columns are even. 
Suppose that a received matrix is 

'1 0 f 
111 . 

1 1 0 

We see that the second row has an error, as does the first column, for their 
entries do not sum to 0 in B. Hence, an error in the 2,1 entry has been de- 
tected, and it can be corrected. We now show that 2 errors can be detected. 
For example, suppose the received matrix is 

'1 0 f 
10 1 . 

1 1 0 

The parity checks on the rows are correct, but the parity checks detect 
errors in the first two columns. 

Comparing the triple repetition [12, 4]-code with this one, we see that 
the present code is more efficient in that a word of length 4 is encoded into 
a word of length 9 instead of a longer word of length 12. -4 

We are going to measure the distance between words in A " . 


c 



Figure 4.8 The Triangle Inequality 


Definition. If X is a set, then a metric on X is a function such 

that: 

(i) S(a,b) > 0 for all a, b e X, and 8 (a, b) = 0 if and only if a = b\ 

(ii) 5 (a, b) = 8(b, a) for all a, b e X\ 

(iii) Triangle inequality 

8(a,b ) < 5 (a, c) + <5(c, b ) for all a, b, c e X. 
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A metric has the essential properties that any self-respecting notion of dis- 
tance should have. Distances are nonnegative, for two points should not be 
—5 units apart, but there should be a positive distance between distinct points. 
The distance from Urbana to Chicago should be the same as the distance from 
Chicago to Urbana. The triangle inequality says that a “straight line” is the short- 
est path between two points. 

Example 4.103. 


(i) If X = M, then 8(x, y) = \x — y| is a metric. 

This is the reason why absolute value is introduced in calculus. 

(ii) Euclidean metric 

If X = W andx = (x \, . . . , x„), then 8(x, y ) = ^Yl'i=i( x i ~ .Vi) 2 is a 

metric. Note that sfx 2 = \x\, so that this definition agrees with the metric 
in part (i) when n = 1 . 

(iii) L 2 metric 

Let L 2 [a, b ] be the set of all square -integrable functions on [a . b \ : that 
is, Lr[a , b] = {/ : [o, b] M : f 2 (x) dx < oo}. Then 


S(f,g) = 



(f(x)-g(x )) 2 


dx 


is a metric on L 2 [a. b\. 

(iv) p-adic metric 

If p is a prime and n eZ, then n = p k u, where k > 0 and p \ u: that 
is, p k is the largest power of p dividing n. Write k = k(n). If we define 
8(n, m ) = p~ k ( n ~ m \ then 8 is a metric on Z. < 

Definition. Let A be an alphabet. The Hamming distance 12 is the function 
8 : A" x A n M defined by 

8(w, w') = the number of i with up yb u/j, 

where w = (a\, , a n ), w' = (a[, , a' n ) e A". 

Proposition 4.104. If A is an alphabet, then the Hamming distance is a metric 
on A n . 

12 After R. W. Hamming. 
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Proof. Let w = {a\, . . . , a n ), w' = {a’ v . . . , a ' n ) e A n . Clearly, 8(w, w') > 0 
and S(w, w) = 0. On the other hand, if S(w. w') = 0, then a, = a'- for all i . and 
w = uf . Plainly, <5 (no, ur) = S(w', w ) , and it remains only to prove the triangle 
inequality. 

If we define <5 /(w, w ') = 1 if a / 7 ^ a ; ' and 8j(u>, w') = 0 if a ,• = a', then 

n 

8(w, w') = ^^<5/ (in, in'). 
i= 1 

It suffices to prove 5/ (in, in') < 5/ (in, z) + <5/ (z, in') for each i, where z = 
(&i,..., b n ). Clearly, this inequality holds when 5/ (in, in') = 0. If 5/ (in, uf) 7 ^ 
0, then aj a .. Now 5/ (in, z) + 5/ (z, in') is either 0, 1, or 2, so that it suffices to 
prove 8, (in, z) + 8,-(z, in') 7 ^ 0. If this sum is 0, then <5/ (in, z) = 0 = <5/(z, in'); 
that is, a,- = bj and b, = a', contradicting 7 ^ a'. • 

Definition. If >1 is an alphabet and C C >1" is a code, then its minimum 
distance is 

d = d(C)= min 5(in, in'), 

w.w'&C 

w^=w' 

where 8 is the Hamming distance. 

The minimum distance is important enough that it is usually incorporated in 
the parameters used to describe codes. 

Notation. An (n, M, d)-code over *4 is a code C ^ A n , where A is an alpha- 
bet, M = |C|, and d is its minimum distance. 

Note that if A = q, then M = q m , for the encoding function E : A m —*■ A" 
is an injection. Thus, an ( n , M . d)- code is an \n. log q M]-code in our earlier 
notation. 

We now give precise definitions of error detecting and error correcting. 

Definition. Let A be an alphabet, and let C c A" he a code. The code C can 
detect up to s errors if changing a codeword c e C in at most ,v places does not 
give a codeword. The code C can correct up to t errors if changing a codeword 
c in at most t places gives a word w € A" whose closest codeword is c. 

For example, the parity check [m + 1, in |-code can detect up to 1 error be- 
cause changing one coordinate of a codeword converts an even word into an odd 
word. 

These definitions can be restated in terms of the Hamming distance. The 
code C detects up to ,v errors if, for each c e C and w € A n , we have 8(c, w) < s 
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implies w £ C. Similarly, the code C corrects up to t errors if S(c, w) < t 
implies that no codeword is closer to w than c. Thus, the correction is the closest 
codeword if there exists a unique such, and this uniqueness says that D : A" — > 
A"', which sends a received word T(E(w)) to its closest codeword, is a well- 
defined function. We are not saying that this closest codeword is actually E(w), 
but it is the best (and most natural) candidate for the truth. 

Proposition 4.105. Let A be an alphabet, and let C C A" be an ( n , M, d)- 
code. 

(i) If d >5 + 1, then C can detect up to s errors. 

(ii) If d > 2t + 1, then C can correct up to t errors. 

Proof. 

(i) If w f c differs from c in at most ,v places, then 0 < S(c,w ) < s. But if 
w € C, then 

s > 8(c, w) > d > s, 

a contradiction. 

(ii) If w is obtained from c by changing at most t places, then 8(c, w) < t. If 
there is a codeword d with 8 (d , w ) < 8(c, w ) , then the triangle inequality gives 

It + 1 ^ d 

< 8(c, d) 

< 8(c, w) + 8(w, d) 

< 28(c, w ) < 2 1. 

This contradiction shows that C corrects up to t errors. • 

Example 4.106. 

(i) The parity check \m + 1, m \- code of Example 4.102(i), consisting of all 
words in B m+1 having an even number of l’s, is an (in + 1,2'”, 2)-code. 
The minimum distance is at least 2, for changing one place in a word with 
an even number of l’s yields a word with an odd number of l’s. By Propo- 
sition 4. 105, this code detects 1 error. However, we see no error correcting. 

(ii) The triple repetition [3m, in |-code of Example 4.102(ii), consisting of all 
words in B 3 '” of the form (w, w, w), where w e B'”, is a (3m, 2'”, Si- 
code, for one must change a codeword in at least 3 places to obtain another 
codeword. By Proposition 4.105, this code detects 2 errors and corrects 1 


error. 
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(iii) The two-dimensional parity [9, 4]-code C in Example 4.102(iii) consists 
of all 3 x 3 matrices over B such that the sum of the entries in each row 
is 0 and the sum of the entries in each of column 1 and of column 2 are 
0. Exercise 4.61 on page 427 asks you to check that at least 3 changes are 
needed to pass from one codeword (here, a matrix) to another codeword. 
Thus, d > 3, and so C detects 2 errors and corrects 1 error. A 

A code with large minimum distance d can correct many errors. For ex- 
ample, a 101-times binary repetition code C is a [101m, m]-code with encoding 
function E\ B m —*■ B 101m which repeats an m-letter word 101 times. Here, 
d = 101, and so Proposition 4.105 shows that C can correct up to 50 errors. 
Obviously, C is a rather impractical code. We can measure this impracticality. 

Definition. The rate of information of an [n, m]-code is m/n. If \A\ = q, then 
we have seen that k = q"\ and so the rate of information of an (n. M, d)- code 
is (log^ M)/n. 

This notion of rate is a natural one: it says that n letters are used to send 
an m-letter message. The multiple repetition [101m, m ]-code just described is 
inefficient, for it has rate j^-j- : sending a short message requires a very large 
number of letters. On the other hand, the irredundant [ m , m]- code with E = 
l A" 1 '■ -4"' — »• A"\ which merely repeats a message verbatim, has rate 1. Thus, 
small rates can correct many errors, but they are inefficient; large rates (i.e., rates 
near 1) may not even detect errors. An ( n , M, d)- code over A can correct t 
errors if d > 2t + 1; hence, it is more accurate when d is large. Exercise 4.62 
on page 428 gives the Singleton bound', if \A\ = q, then M = q"‘ < q n ~ d+ 1 . 
Therefore, m+d <n + 1. If m is large, then the rate m/n is close to 1, but now 
d is small. On the other hand, if m is small, then the rate is small, but now d is 
large; that is, the code can correct many errors. Thus, we seek a compromise: 
( n , M , d)- codes in which both M and d are “large,” for such codes are both 
efficient and accurate. There are other bounds on the number of codewords in an 
( n , M . d)- code over an alphabet A with \A\ = q: 

q n q n 

- < M < - 

Eto 1 (?)(?- D' - “ “ E/Io 0)(9- D' 

(see Exercises 4.63 and 4.64 on page 428). The lower bound is called the Gilbert- 
Varshamov Bound ; the upper bound is called the Hamming Bound (or the Sphere- 
Packing Bound). 


Linear Codes 



408 


Linear Algebra Ch. 4 


If A is an arbitrary set, then defining a function E : A"' — »■ A" can be quite 
complicated. On the other hand, if A is a field, then A m and A" are vector spaces 
equipped with the standard bases. Moreover, if £ is a linear transformation, then 
it can be efficiently described by a formula: there is an n x m matrix A with 
E(w) = wA, where w is an 1 x m row vector. 


Definition. An [n, m\-linear code over a finite field k is an m -dimensional 
subspace of k n . When dealing with a linear code, we assume that the encod- 
ing function E : k m — »■ k n and the decoding function D: k n -> k m are linear 
transformations. 

One usually has no control over transmission functions, and so it is not rea- 
sonable to assume that a transmission function of a linear code is a linear trans- 
formation. 

In contrast to earlier sections, we will now view vectors w e k" as 1 x n 
rows, and we will denote n x 1 columns as w T . 

If k = Wq is the finite field with q elements, then an [n, w]- linear code is 
an (, n , q . d)-code, where d is its minimum distance. In a linear code, there is 
another way of finding its minimum distance. 


Definition. If w = (ai, . . . , a n ) e k n , where k is a field, then the support of w 
is defined by 

SuppfuO = {/ : ai ^ 0}. 

If C is a linear ( n , M . d)-code, then the Hamming weight of w is 

wt(iy) = | Supp(u;)|; 

that is, wt(w) is the number of nonzero coordinates in w. 

Note that wt(w;) = S ( w . 0), where S is the Hamming distance and 0 = 
(0, . . . , 0) (which lies in C because C is a subspace of k n ). 


Proposition 4.107. If C is a linear (n, M, d)-code over a finite field ¥ q , then 


d = mmfwt(c) : c e C}. 

ceC 1 1 

c/0 


Thus, d is the smallest weight among nonzero codewords. 
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Proof. Since C is a subspace, w,w'eC implies w — w' e C. Thus, 

d = min 8(w, w') 

w.w'eC 

to/u/ 

= min {wt(w — it/) : w, w' e C] 

w.w'eC 

to/u/ 

= minfwt(c) : c e Cl. • 

ceC 1 1 

c/0 

We are going to introduce a matrix which describes a given linear code, but 
we first introduce a partition notation for a matrix U . 

Notation. If A is an m x r matrix and B is an m x s matrix, then 

U = [A\B] 

is the m x (r + .v ) matrix whose first r columns give the matrix A and whose last 
s columns give the matrix B. 

It follows from the formula defining matrix multiplication that if N is any 
p x m matrix, where p > 1 , then 

NTA|B] = [7VA|iVB]. 

Let a e S,i be a permutation, and let P a be the n x n permutation matrix 
obtained from the identity matrix / by permuting its columns by a. If k is a field 
and c = (a\, . . . , a n ) e k" is a I x n row matrix, then 

cP a = (fli, ■ ■ ■ , a n )P a = {a a (ij, . . . , Q(r(n))- 

[For example, the first coordinate of cP a is the dot product of c with the first 
column of P a . But c ■ e a (\) = (a \, . . . , a n ) ■ (0, . . . , 1, . . . , 0), where 1 occurs in 
the o' (1 )th coordinate. Thus, this entry is a (T (i).] If P„ is an n x n permutation 
matrix, define or* : k n — > k" by 

ct*(c) = cP a . 

Definition. Two [n. m ] -linear codes C, C' over a field k are permutation- 
equivalent if there is a e S n with or*(C) = C'\ that is, c = (a i, . . . , a n ) eC if 
and only if (a CT (i), . . . , a a{n) ) e C' . 

It is easy to see that permutation-equivalence is an equivalence relation on 
the family of all linear codes in k" . Permutation-equivalent codes are essentially 
the same. For example, if all words in a code C are reversed, then the new, 
reversed, code has the same parameters as the original code. 

If E : k m — »■ k" is an encoding functions with C = im E, then E' = or* o E 
is an encoding function for C' . 
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Proposition 4.108. If C is a linear [ n , m]-code over a field k, then there are a 
linear code C' permutation-equivalent to C and an m x n matrix G of the form 
G = [I\B], where I is the m x m identity matrix, such that 

C' = {w'G : w' e k m } . 

Proof. If e\, . . . , e m is the standard basis of k m and c\, ... ,c m is some basis 
of C, define a linear transformation E : k m — »• k" by E(e t ) = c/ . Now E is 
an injection, by Lemma 4.76(b); in fact, E is an isomorphism k m — »• C. By 
Proposition 4.63, E(w ) = Aw T , where w e k m is viewed as an 1 x m row, 
and A is the n x m matrix whose columns are the vectors £(e,) r . Since it 
is customary in coding theory to consider vectors in k m as rows instead of as 
columns, we write E(w) = wN, where N = A 1 is an m x n matrix. 

By Proposition 4.39, Gaussian elimination converts the matrix A to a matrix 
G in echelon form. There is a nonsingular m x m matrix Q and an n x n 
permutation matrix P a with G = QNP a = \U\B \ for an m x m matrix U in 
echelon form and an m x (n — m) matrix B. Since E is injective, the matrix U 
has no zero rows, and so it is the identity; thus, G = [I\B]. Define 

C' = { w'G : w' € k m }. 

Now eiG, ... , e m G is a linearly independent list, and so m < dim(C'). We 
claim that C and C' are permutation-equivalent; that is, C' = cr*(C). Let c' = 
w'G e C', define w = w'Q, and define c = wN . Note that c e C because 
wN = E(w ) e C. Then 


c' = w'G 
= w'QNP a 
= wNP a 
= cP a 
= n*(c). 


Therefore, C’ C (C). Hence, m < dim(C') < dim(cr*(C)) = dim(C) = m. 
By Corollary 4.25(b), we have C' = cr*(C), and so C and C' are permutation- 
equivalent. • 

Definition. If C is an [n, m]-bnear code over a field k. then an m x n matrix 
G with C = {wG : w e k" } is called a generating matrix of C. An echelon 
generating matrix of C is a generating matrix of the form G = [I\B\. where I 
is the m x m identity matrix. 

Every linear code C has a generating matrix: let G be an m x n matrix whose 
rows form a basis of C. In light of Proposition 4. 108, we may assume that every 
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linear code has an echelon generating matrix. A generating matrix for a code C 
gives an encoding function for it: define E(w) = wG. 

The examples of codes we have given so far are, in fact, binary linear codes; 
that is, they are linear codes over k = B. 

Example 4.109. 


(i) Consider the parity check [m + 1, in |-codc C in Example 4.102(i). The 
codewords are all those c = (b \, . . . , b m + 1 ) e B'" +1 for which ^ b t = 0; 
the codewords form a subspace, and so C is a linear code. Recall that the 
encoding function E : B'“ -*■ B"' +1 is defined by 


E(a i, . . . , a m ) = (a \ a m , b ), 


where b = Y1T= l a ‘- ^ is cas Y to see that £ is a linear transformation. 
Moreover, an echelon generating matrix is the m x (m + 1 ) matrix 

'10 0 ••• 0 f 
0 1 0 ••• 0 1 
G = .... 

0 0 0 ••• 11 


In partition notation, G = [7|fi], where I is the m x m identity matrix and 
B is the column of all l’s. 


(ii) Consider the two-dimensional parity [9, 4]-code in Example 4.102(iii). 
By definition, 


E : (a, b, c, d ) 


a b a + b 

c d c + d 

ci ~\~ c b T d ci ~\~ b -\- c ~\~ d 


it is easy to check that £ is a linear transformation. We find a generating 
matrix G by evaluating £ on the standard basis of B 4 , for the / ill row of G 


is ejG. 

'1 0 0 0 1 0 1 0 1 
0 1 0 0 1 0 0 1 1 

G “ 0 0 1 0 0 1 1 0 1 

0 0 0 1 0 1 0 1 1 


Thus, G is an echelon generating matrix. 


(iii) Consider the triple repetition [3m, m]-code in Example 4.102(ii). An ech- 
elon generating matrix for C is G = [/ 1 / 1 /], where I is the m x m identity 
matrix. 
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(iv) The examples above are too simple. Given a linear [n.mj-code C with en- 
coding function E : k m — »• k " , one generating matrix N for C has rows 
E(e i), . . . , E(e m ), where e \, . . . , e m is the standard basis of k m . In gen- 
eral, Gaussian elimination is needed to convert N into an echelon generat- 
ing matrix for C (or a linear code C' permutation-equivalent to C). ◄ 

If G = [7 1 B] is an echelon generating matrix of a linear [n, m ]- code C, then 
wG = w[I\B] = [w\wB] 


for all w e k m . If there were no errors in transmitting C, then it is obvious how to 
decode a codeword wG : just take its first m coordinates. The last n — m columns 
of an echelon generating matrix G = [I \ B] should be viewed as generalizing the 
last column of the echelon generating matrix of the [m + 1 , m (-parity check code 
in Example 4.109(i). Thus, the last columns B in G = [I\B] are a generalized 
parity check providing redundancy to help decode a message sent over a noisy 
channel. 

In Example 4.109, we started with a linear code and found an echelon gen- 
erating matrix for it. Now we start with an echelon generating matrix G and use 
it to construct a code C = [wG : w e k" 1 }. 


Example 4.110. 

Consider the 4x7 matrix 

'1 0 0 0 0 1 r 

0 10 0 10 1 

u - 0 0 10 110 

0 0 0 1 1 1 1 


and define the Hamming [7, 4 ]-code to be C = {wG : ro el 4 }. 

Obviously, the rate of information is r = i. Let us compute the minimum 
distance d of C. Denote the rows of G by y\, yi, Y 3 , Y4- Since k = B here, 
linear combinations JT a, /,- are just sums (for a, is either 0 or 1). By Proposi- 
tion 4.107, we can compute d by computing weights of codewords. Now 


wt(yi) = 3; wt(y 2 ) = 3; wt(y 3 ) = 3; wt(y 4 ) = 4. 

There are (jj) = 6 sums of two rows, and a short calculation shows that the 
minimum weight of these is 3; there are ((!) = 4 sums of three rows, and the 
minimum weight is 3; the sum of all four rows has weight 7. We conclude that 
the minimum distance d of C is 3, and so C detects 2 errors and corrects 1 error, 
by Proposition 4.105 

This construction can be generalized. If £ > 3, there are 2^—1 nonzero 
words in B f . Define a (2 l — 1 — l) x (2 e — 1) matrix G = [7|B], where the 
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columns of B are all id e B 2 * 1 1 with wt(io) > 2. Define the Hamming 
[2 l — 1, 2^ — 1 — t]-code to be C = [wG : w e Of course, G is an 

echelon generating matrix for C. The rate of information of a Hamming code is 

2 l - 1 - t i i 
2 l - 1 = 1 ~~ 2 l -V 

as i gets large, the rate of information of the i tli Hamming code gets very close 
to 1 . It can be shown that the minimum distance of every Hamming code is 3, as 
in the [7, 4]-code. ◄ 

We now wish to construct (linear) codes which can correct many errors and 
which also have high rates of information. 

Recall Theorem 3. 115: If k is a field, if fix) € k\x), and if / = (f(x)) is the 
principal ideal generated by fix), then the quotient ring k[x]/I is a vector space 
over k with basis the list 1, z, z 2 , . ■ ■ , z” , where z = x + I. Thus, k[x]/I is 
n -dimensional, and there is a (vector space) isomorphism k n — »• k[x]/I. Let us 
denote words in k" by (oo- . . .,a n -\) instead of by {a\, a 2 , . ■ . , a n ), for then 
( oo , a i, . . . , a n - 1 ) i-> ao+«izH \-a n - tz"” 1 is an isomorphism k n —*■ k[x]/I. 

Definition. A cyclic code of length n over a field is a linear code C such that 
(ao. a i, . . . , a, ,_i) e C implies (a n -\, a$, . . . , 2 ) e C. 

That k[x]/I is a commutative ring in addtion to being a vector space will 
now be exploited. As above, identify a word ( a 0 , a \, . . . , o„_i) e k" with the 
(coset of) a polynomial oq + a\z + • • • + 0 ,,-iz' 1 ” 1 + / e k[x]/I. 

Proposition 4.111. Let k be a finite field, let 1 = (x n — 1) be the principal 
ideal in k\x ] generated by x n — 1, and let z = x + I. Then C C k[x]/I is a 
cyclic code if and only ifC is an ideal in the commutative ring k[x\/I. Moreover, 
C = (giz)), where g(x) is a monic divisor of x n — 1 in k\x\ 

Proof Let C be an ideal in k[x]/I, and let c = ao + a iZ + • • • +a /) _iz"” 1 e C. 
Since C is an ideal, C contains zc = a^z + aiz 2 + ■ ■ ■ +a„_ 2 Z ,I_1 +o„_iz". But 
z" = 1 (because z is a root of x n — 1). Hence, o„_ 1 +ooz + • • • + o„_ 2 z' ! ” 1 e C, 
and C is cyclic. 

Conversely, assume that C is a cyclic code. Since C is a linear code, C is 
closed under addition and scalar multiplication by elements in k. As we have just 
seen, multiplication by z corresponds to shifting coefficients one step to the right 
(and making a„_ 1 the constant term). The reader may prove, by induction on i, 
that C is closed under multiplication by all elements fio + ^tZ + • ••+fi,_iz ,_1 e 
k[x]/l. Therefore, C is an ideal. 
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Let ji : k[x ] — »■ k[x]/I be the natural map, and consider the inverse image 
J = /3~ l (C) = {/(x) e k[x] : f(z ) e C}. By Exercise 3.45 on page 248, J is 
an ideal in k[x ] containing x" — 1. But every ideal in k [ x | is a principal ideal, 
by Theorem 3.59, so that there is a monic g(x) e k[x ] with J = (g(x)). Since 
x” — 1 e J, we have x" — 1 = h(x)g(x) for some polynomial /i(x); that is, 
g(x) | (x" — 1). Finally, since J is generated by g(x), its image C = ft(J) is 
generated by fi(g(x)) = g(z,). • 


Definition. Let C C k[x]/I be a cyclic code, where I = (x n — 1). A monic 
polynomial g(x) e k[x] is called a generating polynomial for C if C = (g(z)), 
where z = x + I. 

As in Proposition 4.111, a generating polynomial g(x) of a cyclic code of 
length n is a divisor of x” — 1. 


Corollary 4.112. IfC is a cyclic code of length n over a field k with generating 
polynomial g(x), then dim(C) = n — deg(g). 

Proof Since g(x) | (x n — 1), there is an inclusion of ideals I = (. x n — 1) C 
(^'(xj) = J . Regarding k[x\ and its quotients merely as vector spaces over k, 
we see that “enlargement of coset” y : k[x\/I — >■ k[x]/ J , given by h(x) + I i-> 
h(x) + J . is a surjective linear transformation. To compute ker y, consider the 
diagram 

k[x\ — k[x]/I 



where a and ji are the natural maps. Now ji — you and a, ft, and y are 
surjective. Thus, the hypothesis of Exercise 2.91 on page 188 holds, and so 

kery = a(ker/3) = a((g(x)) = ( g(z )) = C, 

where z = x + I. As vector spaces, (k[x]/ 1 ) / C = k[x]/J (this is just the first 
isomorphism theorem). Hence, 

dim(^[x]/7) — dim((C) = dim(k[x]/7). 

But dim(&[x]/7) = deg(x” — 1) = n and dim(k[x]/7) = dim(k[x]/(g(x)) = 
deg(g), so that dim(C) = n — deg(g). • 
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Corollary 4.113. IfC is a cyclic code of length n with generating polynomial 

g(x) = go + gix H b g s x s , then a generating matrix for C is the in — s) x n 

matrix: 



go 

gi 

g 2 


gs 

0 

0 •• 

■ 0 


0 

go 

gl 

g2 


gs 

0 •• 

• 0 

G = 

0 

0 

go 

gl 

82 


g . v •• 

• 0 


0 

0 


0 

go 

gl 

82 ■ ■ 

• gs 


Proof. Since C is an ideal, gix), xg(x), x 2 g(x), . . . , x n ~ s g(x ) are codewords, 
and these codewords correspond to the rows of G. Write G = [T|5], where T is 
the (n — s) x (n — s) submatrix consisting of the first n — s columns of G. As T 
is an upper triangular matrix with all diagonal entries go, we have det(T) = g^, 
by Exercise 4.44 on page 398. Now go is, to sign, the product of all the roots 
of g(x); but the roots of g(x) are roots of unity, because g(x) | (x" — 1), and 
so go 0. Hence, det(r) 0, and the list of n — s rows of G is linearly 
independent. Since dim(C) = n — s, by Corollary 4.1 12, the list of rows of G is 
a basis of C, and so G is a generating matrix for C. • 

The roots of x n — 1 are nth roots of unity. Recall that an element z in a held 
k is a primitive nth root of unity if z n = 1 but z‘ 1 for all i with 0 < i < n. 


Lemma 4.114. Let denote the finite field of q elements. If n is a positive 
integer, then there exists a primitive nth root of unity in some extension field of 
if and only if in, q) = 1. 

Proof. Assume that (n, q) = 1, and let E /¥ q be a splitting held of fix) = 
x" — 1 over F ? (actually, we need only that /(x) splits over E, which follows 
from Kronecker’s Theorem 3.1 18). Now the derivative fix) = nx n ~ l — I f 0, 
so that if, f) = 1 (they have no common root); hence, fix) has no repeated 
roots, by Exercise 3.63 on page 271. Thus, if K is the set of all the roots of 
fix) = x" — 1, then K is a multiplicative group of order n. But K is cyclic, by 
Theorem 3.122, and a generator of K must be a primitive nth root of unity. 

Conversely, assume that there exists a primitive nth root of unity. Now q = 
p s for some prime p. If in, q) 1, then p \ n\ that is, n = pu for some 
integer u. Hence, x" — I = x pu — 1 = (x" — \) p , and so the multiplicative 
group of all nth roots of unity has fewer than n elements. Therefore, there is no 
primitive nth root of unity. • 

The following theorem will enable us to construct codes that can correct 
many errors. 
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Theorem 4.115 (Bose-Chaudhuri-Hocquenghem). Let be the field of 
q elements, let n be a positive integer with (n , q ) = 1, and let C be a cyclic 
code with generating polynomial g(x). Iff is a primitive nth root of unity, and if 
consecutive powers f f " +1 , . . . , f " +f are roots of g{x), where u + 1 < n, then 
d = d(C) >1 + 2. 

Proof A codeword c = (co, ci, . . . , c„_ i) e (F ? ) " is identified with the poly- 
nomial cq + c\x + . . . + c n -\x n ~ { eF ? [r], and its weight wt(c) is the number 
of its nonzero coefficients. By Proposition 4.107, it suffices to prove that ev- 
ery nonzero codeword has at least 1 + 2 nonzero coefficients. Suppose, on the 
contrary, that there exists a nonzero codeword c(x) with wt(c) < 1 + 2; thus, 
c(x) = CiyX 11 + ■ ■ ■ + Ci t+l x H+l , where i\ < ■ ■ ■ < ii+\. If a e¥ q , then c(a) is 
the dot product 

c(a) = [c h c i2 . . - c I€+I ][l a' 1 a' 2 . . .a ,<+1 ] 7 . 

Now form the (f + 1) x (£ + 1) matrix W whose / th row, for 0 < j < t, arises 
from f u+ j\ 



f-ui\ 

j-ui2 

jr 

w = 

jr (W— |— 1) / 1 

jrtu+\)i 2 



£■( u+Z)i\ 

j-iit+Qh 

j- (u+t)it,J r \ 

[Ci\ Q'2 ’ ■ 

. . c l( l ], then 


Wcl 

= [c(f u ) 

c(C M+1 ) .. 

_ c {i; u+l )] r = i 


Factoring out f ul > from the / th column of W gives the (transpose of the) 
(£ + 1) x (I + 1 ) Vandermonde matrix 


V = 


1 

i 

1 


■ 

^k+ i 



^k+i 


j.ll2 

f lil 


By Exercise 4.45 on page 398, we have 

det(W) = f uh ■ ■ ■ f uh det(V) = uh 


■ f uil ]~ [ f '' - 

j<k 


We claim that all the f '/ are distinct. If j < k, then 0 < i\ — ij < n (because 
ij < ik < n), and so f' k ~ l i = 1. Hence, if f' k = f 9, then we contradict f being 
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a primitive nth root of unity. We conclude that dct( W ) f 0. But cj 7 ^ 0 and 
WcJ =0, contradicting the nonsingularity of W . Therefore, no codeword of 
weight <1 + 2 exists, and d(C) >1 + 2. • 

Definition. A linear code over a field k is a BCH -code of length n if it is a 
cyclic code having a generating polynomial g(x) which has consecutive powers 
£w, £ H+l, . . _ , £ u+l amon g it s roo ts, where C, is a primitive nth root of unity and 
0 <u<u+t<n. 

Corollary 4.116. Let C be a BCH-coc/c of length n with generating polynomial 
g(x). If consecutive powers f ", f “ +1 , . . . , f u+i occur among the roots of g(x), 
where t = 2t or i = 2t + 1 and u + i < n, then C corrects up to t errors. 

Proof. By Theorem 4.1 15, we have d(C) > i + 2 > 2t + 1, and so Proposi- 
tion 4.105 applies. • 

Corollary 4.117. For any prime p and any positive integer t, there exists a 
BCI \-code C over which corrects up to t errors. 

Proof. Let k = ¥ q , where q is a power of p and 2t + 1 < q — 1 . By Theo- 
rem 3.122, the multiplicative group k x is a cyclic group of order q — 1, and a gen- 
erator £ is a primitive (q — I )th root of unity; hence, ft; 2 ,..., f 2 ' 11 are distinct. 
For each j. Corollary 3.1 17 gives a monic irreducible polynomial h j (x) e k[x \ 
having as a root. Finally, define 

g(x) = lcm{/zt(x), . . ., h 2t +i(x)}, 

and define C to be the BCFI-code with generating polynomial g(x). The result 
now follows from Corollary 4. 1 16. • 

To see whether a cyclic code is a BCH-code, we must determine the roots of 
its generating polynomial. Thus, finite fields must be investigated in more detail. 

Example 4.118. 

Let us describe Fg . We know that its multiplicative group of nonzero elements 
is cyclic of order 7, and a generator £ is a primitive 7th root of unity. There 
is an irreducible polynomial m(x) e F 2 IA] having ( as a root. As in Proposi- 
tion 3.1 16, deg(m) = 3 = dimjr 2 (Fg ). In Example 3.99, we saw that there are 
only two irreducible cubics in F 2 U], namely, x 3 +x + 1 andx 3 +x 2 + 1. To be 
explicit, we choose £ to be a root of the first, so that 

= f + 1 - 
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+ 

1 


f 2 

< 3 

< 4 

f 5 

? 6 

1 

0 

? 3 

f 6 



c 4 


? 


0 


1 

< 3 

c 6 

f 5 

f 2 



0 

f 5 


< 3 

1 

f 3 




0 

f 6 

< 2 

? 4 

? 4 





0 

1 

r 3 







0 

? 

c 6 







0 


Table 4. 1 : Addition Table for F§ 


We now give the addition table for . Since Fs has characteristic 2, the diagonal 
entries have the form f f ' = 2 f' = 0; since addition is commutative, the 
addition table is a symmetric matrix, and so it is only necessary to compute its 
upper triangular half. Let us compute the first row of Table 4.1 consisting of 
f- 7 + 1 for 1 < j < 6. We have £ + 1 = £ 3 , and £ 3 + 1 = (f + 1) + 1 = £. 
Next, 

f 2 + l = (f + l ) 2 = (f 3 ) 2 = 6 ; 

t ; 4 + l = (C 2 + i ) 2 = (C 6 ) 2 = £ 12 = C 5 ; 

<r 6 + l = (<r 3 + i) 2 = f 2 . 

It follows that £ 5 + 1 = i; 4 , for all the other powers of £ have occurred. Now use 
the first row to compute the second row. For example, f 7 + f - 7-1 + 1). 

The reader is invited to verify the remaining entries. ◄ 

Example 4.119. 

(i) The factorization into irreducibles of a 7 — 1 in F 2 [x | is 

x 7 — 1 = x 7 + 1 = (x + l)(x 3 + X + l)(x 3 + x 2 + 1). 

Let C be the cyclic binary code of length 7 having generating polynomial 
g(x) = x 3 + x + 1. Now one root of g(x) is a primitive 7th root of unity, 
say, f . To find the other roots, we use the division algorithm to obtain 

g(x) = x 3 +x+l = (x + f )(x 2 + fx + f* 3 ). 

The quadratic formula does not apply over fields of characteristic 2. How- 
ever, by evaluating /(£ ') for each i, using Table 4.1, we can show that C 1 
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and f 4 are the roots of x 2 + f x + £ 6 . Thus, f f 2 are consecutive roots of 
g(x), and so C is a BCH-code with l = 1. In fact, C is a [7, 4]-code (for 


7 — deg(g) = 4) with d (C) 

i > 

+ 

KJ 

II 

: 3. 

By Corollary 4.1 13, 

a generating 

matrix for C is 











“l 

0 

1 

1 

0 

0 

0 




0 

1 

0 

1 

1 

0 

0 



G = 

0 

0 

1 

0 

1 

1 

0 




0 

0 

0 

1 

0 

1 

1 



The echelon form of G is 











”1 

0 

0 

0 

0 

1 

r 




0 

1 

0 

0 

1 

0 

1 



G = 

0 

0 

1 

0 

1 

1 

0 




0 

0 

0 

1 

1 

1 

1 



Thus, C is the Hamming [7, 4]-code 

in 

Example 4. 

.110, and 

so this code 


is a BCH-code. (It can be proved that all the Hamming codes are BCH- 
codes.) 

(ii) Consider the cyclic binary code C of length 7 having generator polynomial 
g(x ) = (x + 1 )(x 2 + X + 1) = X^ + X 2 + X" + 1 . 


Now 1 = £°, £, £ 2 are roots of g(x), and so C is a BCH-code; in fact, C is 
a [7, 3]-code with d(C) > 4. By Corollary 4.1 13, a generating matrix for 
C is 


G = 


1 1 
0 1 
0 0 


10 10 0 
110 10 
1110 1 


◄ 


It is not obvious how to compute the degree of the generating polynomial 
of a BCH-code, although there are extensive tables of them. The degrees of 
generating polynomials of the following BCH-codes can be computed easily, 
because a primitive nth root of unity lies in the ground field. 


Definition. A Reed-Solomon code over the finite field ¥ CJ is a BCH-code with 
generating polynomial 

g(x) = (x-0(x-^)---(x~^~ l ), 

where 1 < d < q — 1 and ( is a primitive (q — l)th root of unity in . 

By Theorem 4. 1 15, we have d (C) > d + 1 (for i = d — 2) if C is a Reed- 
Solomon code with minimum distance d(C). 



420 Linear Algebra Ch. 4 

Corollary 4.120 (Reed-Solomon). There exists a t-error correcting BCH- 
code over F 2 2 r+i with rate 1 — [2f/(2 2 ' +1 — 1)]. 

Proof. Let £ be a primitive element of , where q = 2 2t+l , and define 

g(x) = (x - f)(x - t; 2 ) ■ ■ *(x - l; 2t ) e F 9 [x]. 

The Reed-Solomon-code C with generating polynomial g{x) has d > 2t + 1, by 
Theorem 4. 1 15, and so it is a f -error correcting code, by Proposition 4. 105. Now 
deg(g) = 2 1 and C is a code of length q — 1 = 2 2,+ 1 — 1. Thus, the rate of C is 
(q - 1 )/(q - 1 - 2f) = 1 - [2f/(2 2,+ 1 - 1 )]. . 


Example 4.121. 

The Reed-Solomon code C over F5 having generating polynomial 

g(x) = (x — 2)(x — 2 2 )(x — 2 3 )(x — 2 4 ) = (x — 1 ) (x — 2)(x — 3)(x — 4) 

is a [4, 2]-code over F5 with d(C) > 5, and so it detects up to 4 errors and it 
corrects up to 2 errors. A generating matrix for C is 


Variants of Reed-Solomon codes are used to send photographs from outer 
space. One reason is that they can correct bursts. Suppose that a string of ,v con- 
secutive bits in a received message is suspect (perhaps a cosmic ray intercepted 
a part of the message). Let us illustrate how a Reed-Solomon code can deal with 
this. Suppose that C is a Reed-Solomon code of length 255 over F256 having 
generating polynomial g(x) = n,= i ( x ~ £')• If y is a primitive 255th root of 
unity, then C corrects up to 5 errors. Each codeword c e C lies in (F256) 255 , 
and each letter in c e F256 • Now construct a binary code which replaces each 
letter in a codeword in C by a bitstring of length 8. If an interval of length 33, 
say, in this new code is suspicious, rewrite the word as a word in the original 
Reed-Solomon code. The rewritten word involves at most 5 errors, and hence 
it can be corrected. In this way, Reed-Solomon codes use finite fields to correct 
error bursts. 

The next proposition gives a criterion for testing whether a word is a code- 
word. 


Proposition 4.122. If G = [ / 1 /l ] is an m x n echelon generating matrix of a 
linear code C over a field k, then w £ k n lies in C if and only if w\—B T \I ] 7 = 0. 
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Proof. If w e C, then in is a linear combination of the rows of G. Now the 
/th row of G is e,G, where e \, . . . , e m is the standard basis of k m . Hence, if 
G\—B T \I] T = 0, then ( ei G)[-B T \I] T = 0 for all i, and so w[-B T \I] T = 0. 
Now the /th row of G is e,G = e,[/|B] = [e, |e,-5]; the y'th column of [— 
is [— B 1 1 1] 1 ej, which is the y th row of ej\—B l |7] = [—ejB T \ej]. Hence, 
using the partition notation introduced on page 409 in the special case of 1 x n 
matrices, the i j entry of G[— B 1 \ I] 1 is the dot product 

(G I bil, ■ • • » bin— m) ' ( b ] j , . . . , b n — m ,j \ ej ) = bjj b\j = 0. 

Therefore, w[—B T \I] T =0. 

Conversely, consider the homogeneous system [— B 7 ]!} 1 x = 0 and its 
solution space S = {u r e k n : [— B 1 \I] T v T = 0}. Now v € S if and only 
if v[—B t \I] = 0, so that the first part of the proof shows that C c S. But 
dim(C) = m, while dim(S) = n — r, where r = ranki[— B 7 |/] r ) = n — m. By 
Theorem 4.42, we have dim(5) = n — r = m, and so C = 5, by Corollary 4.25. 
Therefore, if w[—B T |/] 7 = 0, then w e S, hence, w e C. • 

Definition. If G = [I\B] is an m x n echelon generating matrix of a linear 
code C, then the n x m matrix H = [-B 1 1/] 7 is called a parity check matrix 
for C. 

Remark. If C c k n is a code, define its dual code to be the orthogonal com- 
plement 

C ± = {y e k n : (y, c) = 0 for all c e C}, 

where (y, c) = + • • • + a n b n is the usual dot product, y = (nj, . . . , a n ), 

and c = (b\, , b n ). It turns out that if G = [ / 1 /i ] is a generating matrix for C, 
then the parity check matrix H = [— B r \I] 1 is a generating matrix for the dual 
codeC -1 . ◄ 

Until now, we have considered linear codes and their encoding functions. A 
decoding function must, of course, override the errors introduced by a transmis- 
sion function. 

Definition. Let C C k n be a linear code. If y e k" and c e C. then the error 
vector is 

e = e(y,c) = y - c. 


Of course, wt(e) is the number of nonzero coordinates in which y and c disagree. 


Let us now try to decode a received message in a special case. 
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Example 4.123. 

Suppose that we are using a Reed-Solomon-code C of length n over a field k 
whose generating polynomial g(x) has 1, f. f 2 , £ 3 among its roots, where ( is a 
primitive nth root of unity lying in k. Assume further that f e k. Thus, i = 3 
and C corrects up to t = 2 errors, by Corollary 4.1 16. We assume that there is 
a (necessarily unique) codeword c so that <5(y, c) < 2; indeed, we assume that 
<5(y, c) = 2, so that y = c + e, where c is a codeword, e is the error vector, 
and wt(e) = 2. Can we recover c? As usual, words y = («o, a t, . . . , a n -\ ) of 
length n are encoded as polynomials ciq + a \* + • — b a„_ix' !_1 ; a codeword c 
corresponds to a polynomial divisible by g(x), and so 1, £, £ 2 , £ 3 are also roots 
of c. This suggests that we try to find c by using a Vandermonde matrix, as in 
the proof of Theorem 4. 1 15, the BCH theorem. 

Let 

€ = y -c = (h o, hi h n - i). 


Thus, Supp(e) = {/, _/}; we do not know i, j, hi, h j. As a polynomial, e(x) = 
/i,x' +hjx ] . Define a 3 x n matrix 


U = 


1 

1 

1 


1 1 

< 2 ? 4 


1 

^n — 1 
£■ 2 ( 77 — 1 ) 


Now Cy r = [y ( 1 ) y(f) y(f 2 )] 7 is known. It follows that t7c r = 0, for 
1, f, f 2 are roots of c (thus, the matrix C 7 resembles the parity check matrix 
H in Proposition 4.122; however, we do not say, conversely, that if Uy T = 0, 
then y e C). Hence, Uy 1 = (7c r + Ue 1 = Ue T , and so there is a system of 
equations 

hi + hj = v(l) 

■hif+hjSi =y(0 (D 

hiS 2 , +hjt 2j = y(C 2 ). 

Since we not know i and j , we do not know f ' and k J ■ we introduce new notation 
for them: 

Zi = C' and Z2 = ^ j - 

Rewrite the system in this notation: 


hf + hj = y(l) 
hiz\ +hjZ 2 = y(0 
hizj + hjzl = v(f 2 ). 
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Consider the polynomial f(x) = (x — z\)(x — zi) = x 2 + ax + b. Of course, 
f(zi) = 0 = and we use these equations to find a and b. Now 

by( 1 ) = b(hj + h j) = bhj + bhj 
ay(^) = aQiiZi + hjZi) = ahjZi + ahjzi 

y{i; 2 ) = hiz\ + hjZ2 2. 

Adding the columns, 

fiy(l) + ay(£) + y(t; 2 ) = /t;/(zi) + h jf(zi) = 0. 

We claim that by(^) + ay(t; 2 ) + y(l) = 0. 

by(£) = bhizi +bhjZ2 
ay( f 2 ) = a/i/Zj + ah jzi2 
3 ) = fizz? + ahjzi?>. 


Hence, 

fiy(f) + ay(f 2 ) + y(l) = hi(bz\ + az\ + z 3 ) + hj(bz2 + azj + z\) 
= hii\(b + az i + z 2 ) + hjZ2(b + az2 + z 2 ) 

= 0. 


We are going to solve for a and b. In fact, (a. b ) is a solution of the linear system 


h j + h j 

hjZ\ + hj z, 2 

a 

hiZi + hjZ 2 

hiz\ + hjz\ 

b 


But the coefficient matrix can be factored: 


= hjhj 


hi + hj hiZi+hjZ2 

hiZ\+hjZ2 hiz\ + hjz\ 

Now all the 2x2 submatrices of the 2 x n matrix 


1 

zi 


i ' 

'1 

Zl 

Z2 

1 

Z2 


1 1 1 ... 1 

i f t , 2 ... 


(2) 


are nonsingular. In particular, the coefficient matrix of Eq. (2) is of the form 
SS T , where S is such a submatrix, and so the system can be solved for a and 
b (note that /;,• ^ 0 and hi ^ 0 because wt(e) = 2). We can now find z i and 

r\ 

Z 2 , for they are the roots of f(x) = x + ax + b, and they are known to lie 
in F 9 . If q is odd, we find z t and Z 2 by the quadratic formula; if q is even, 
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then we can find the roots by trial and error (there are only q — 1 candidates). 
But z\ = f ' and zi = f 7 , and so we can find i and j . Finally, we can find //,■ 
and hj from the first two equations in Eq. (1), for the coefficient matrix ^ ^ j 

is nonsingular. In sum, we have used the Vandermonde-like matrices U and V 
to find the error vector e whose only nonzero coordinates are //,■ and hj. One 
decodes y by defining D(y ) = w, where c = y — e, c = E(w), and E is 
the encoding function. The method gives the codeword c by correcting y, and 
codewords can always be decoded. ◄ 


We now generalize Example 4.123 to decode Reed-Solomon-codes. 

Let C be a Reed-Solomon-code of length n over a field k with generating 
polynomial g(x), and assume that consecutive powers 1, f, f , . . . , £ 2r_1 occur 
among the roots of g(x), where f is a primitive nth root of unity lying in the 
held k. We define a (t + 1) x n matrix U : 


U = 


1 1 
1 f 

1 < 2 


1 

£•(»-!) 

^ 2 ( 77 - 1 ) 


(3) 


Li ^ 


and a/xn matrix V : 


j-t(n-l) 


V = 


1 1 
1 K 


1 

f 2 



(4) 


jjt - 1 ^2(r-l) _ ^n(t-l)J 

Recall that if u = (ao, a\, . . a n - 1 ) and v = (bo, hi, ... , h„_ f) are vectors 
in k" , then their Hadamard product is defined by 


uov = (a 0 bo, a\b\, a n -\b n -\). 


It is easy to see that 


(u + u') ov = uov + u'ov. 


Definition. Let C be a Reed-Solomon-code of length n which corrects up to 
t errors. If [ ROW';/ (; ) o ROWy(/r)]c r = 0 for all i, p and all c e C, then the 
matrices U and V in Eqs. (3) and (4) are called the decoding pair for C. 
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Lemma 4.124. Let C be a Reed-Solomon-code of length n over a field k whose 
generating polynomial has 1, tf 1 , . . . , f 2f_1 among its roots, where f is a 
primitive nth root of unity lying in k, and 2t — 1 < n. 

(i) The matrices U and V in Eqs. (3) and (A) form a decoding pair for C ; that 
is, 

[ROW u(i) o ROWv(/i)]f 7 =0 
for all i and p and every codeword c € C. 

(ii) rank(f/) = t + 1. 

(iii) Every t columns ofV form a linearly independent list. 

Proof. 

(i) Now 

row u (i ) = (1 f t; 2 ' ... 

where 0 < i < t. and 

ROW y (p) = (1 ^ f 2 ^ ... f ( "“ 1)M ), 
where p < t. Therefore, 

ROW u(i) o ROW v ip) = (l C' +/x . . . £ (”- 1 )0+/x))_ 

Since i + p < 2t < n, we have f ,+// a root of g(.r), and hence it is a root of any 
codeword c = c(x). Therefore, 

[ROW u(i) o ROW v(p)]c 7 = c(t;' +/1 ) = 0. 

(ii) It suffices to show that U has a nonzero (t + 1) x (t + 1) minor, by Exer- 
cise 4.49 on page 399. But the first t + 1 columns of U form a Vandermonde 
matrix V^l, f, f 2 , . . . , f f ) which is nonsingular, by Exercise 4.45 on page 398. 

(iii) Columns j i, . . . , jt form a Vandermonde matrix V (f J1 , f 11 , . . . , f 1 ’ ), and 
this matrix is nonsingular because f n , £ J1 , . . . , are all distinct. It follows 
that the corresponding columns form a linearly independent list. • 

Theorem 4.125. Let C be a Reed-Solomon-code of length n over a field k 
whose generating polynomial has 1, f , . . . , £ 2r_1 among its roots, where f 
is a primitive nth root of unity lying in k, and 2t — 1 < n. 

Let y = (vo, . . . , y„~ i) = c + e, where c e C and wt(e) < t. If (U, V ) is 
the decoding pair for C, then there exists a nonzero vector u = (uq, . . . , u n - 1 ) € 
Row(U) which is a solution of 

n— 1 

y^(>’,T' M )n, =0 for all p = 0, 1, . . . , t — 1. 
i = 0 


( 5 ) 
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Moreover, u € Row(U) is a solution of Eq. (5) if and only if 

eon = ( 0 , . . . , 0 ). 

Proof Eq. (5) is a homogeneous system Au = 0, where A is the t x n matrix 
whose /rth row, for /r = 0, 1, . . . , t — 1, is the Hadamard product yoROWi /(/r) 
[because (1 f =ROWi/(/r)]. If rank(A) = r, then r < t and so 

dim(5o/(A)) = n — r > n — t. If Sol(A) n Row(U) = {0}, then Exercise 4.19 
on page 345 gives 

dim[5o/(A) + Row(U)] = dim [So/ (A)] + dim[Row(U)] 

= (n — r) + (t + 1) 

> (n — t) + {t + 1) 

> n, 

contradicting Sol(A) + Row(U) C k n . Therefore, Sol(A) n Row(U) {0}, 
and so there exists nonzero vectors u e Sol (A) n Row(U), as claimed. 

We claim that Eq. (5) has the same solutions as 


n — 1 

Tfhi^)ui =0 for all /r = 0, 1, . . . , t — 1, (6) 

i = 0 

where e = (ha. hi, , //„_]). If u is a solution of Eq. (5) and c is a codeword, 
then 

ROW u (i ) o ROWy(/x)y r = ROW[/ 0) o ROW v(d)c T + ROWj/(i) o ROWyC/x)/ 
= ROW{/ (i ) o ROW v (fl)e T , 

because ROWj / (/) o ROW y(ix)c T = 0, by Lemma 4.124. Hence, u is a solution 
of Eq. (6). Similarly, any solution of Eq. (6) is a solution of Eq.(5). 

Let u e Row(U). If e o u = (0, . . . , 0), then hjUj = 0 for all i, and so 
E''=a = YUZo (hiUi)Z 11 = 0 for all n = 0, 1, . . . , t - 1. Therefore, 

u satisfies Eq. (6), and hence it satisfies Eq. (5). 

Conversely, If u satisfies Eq. (6), then YZiZa (hi u i)% i>x = 0 for all /i = 
0. I .... , n — I . Thus, YZiZo (h i u < ) fi =0 (°f course, 0 denotes the zero column), 
where fa, . . . , f n -\ are the columns of V. Let wt(e) < t, so that there at most 
t nonzero /z,-; call them hj l , . . . , hi,. It follows that ^J, =1 h lv Ui v fi v = 0. But 
any t columns of V form a linearly independent list, by Lemma 4.124. Hence, 
hi v Uj v = 0 for all v = 1, . . . , t. Now h,u,- = 0 for all i i v , because hi = 0 for 
these indices, so that hjUj = 0 for all i. Therefore, e o u = (0 0). • 
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Remark. Vectors u given in Theorem 4. 125 should be viewed as error locator 
vectors. If i e Supp(e), then h t ^ 0. But e o u = (h quq, . . . , h n _\u n _\) = 
(0 0) implies that m, = 0. Thus, Supp(e) c Z(u), where 

Z(u) = {indices i : w =0}. 

Thus, finding u corresponds to finding the indices i and j in Example 4.123. 
Note, however, that Supp(e) C Supp(n), and it is possible that u gives “extra” 
indices in Z(u) that do not correspond to error positions. A 

Let ( U , V) be the decoding pair for a Reed-Solomon-code C of length n 
whose generating polynomial has 1, £, £ 2 , . . . , £ 2,_1 among its roots, where £ 
is a primitive nth root of unity lying in k, and 2t — I < n. Let y e k n be a word. 
If e has weight < t, then there is a unique codeword c with y = c + e which can 
be found by solving the system of linear equations 

y — e e C and € o u = 0. 

Of course, the condition y — € e C can be written explicitly using the rows of 
a generating matrix of C. Thus, Reed-Solomon-codes can be decoded by linear 
algebra. 

Suppose that C is a BCH-code over a field k. If one redefines a decoding 
pair of matrices (U, V ) as matrices with entries in k satisfying the conclusions 
of Lemma 4.124, then it can be shown that every BCH-code has a decoding pair 
which can be used to decode it. 


Exercises 

4.60 Let A be an alphabet with \A\ = q > 2, let T : A " -*■ A" be a transmission 
function, and let the probability of error in each letter of a transmitted word be p. 
where 0 < p < 1. 

(i) Prove that the probability of the occurrence of exactly t erroneous letters 
in a transmitted word of length n is 

(■T r )' 11 

(ii) Prove that the probability is 

G >‘< 1 

and conclude that this probability is independent of q . 

* 4.61 Prove that d > 3, where d is the minimum distance of the two-dimensional parity 
code in Example 4.102(iii). 
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*4.62 Let A be an alphabet with |.4| = q, and let C C A n be an (n, k , d)- code. 

(i) Deli ne n : C — »■ A’~ d+l by jt(c\ , . . . , c„) = (cj, . . . , c„). Prove that n 
is an injection. 

(ii) ( Singleton Bound. ) Prove that 

k < q n ~ d+X . 

*4.63 If A is an alphabet with |.4| = q, and if u e A! 1 , defi ne the (closed) ball of radius 
r with center u by 

B r {u) = {w e A” : S(w, u) < r], 

where S is the Hamming distance. 

(i) Prove that 

|{w e A! ' : S(u, w) = i}| = ( V# — 1)'. 


(Ii) Prove that 

\B r (u)\ =X)(")(9-l) i - 

1=0 ^ 1 ’ 

(iii) (Gilbert- Varshamov Bound.) If C C A " is an (n,k,d)- code, where 

| A | = q, prove that 


. , < k. 

Eto (■)(<? -1)'' _ 

*4.64 (Hamming Bound ) If C C A" is an (n, k. d)- code, where \A\ = q and d = 2t + 1, 
prove that 

k ^ g" 

“ E,io(")(?-i) r 

4.65 An (n, k, d)- code over an alphabet A with \ A\ = q is called a perfect code if it 
attains the Hamming bound: 


E,'Io 

Prove that the Hamming [2 e — 1, 2 e — 1 — f]-codes in Example 4.1 10 are perfect 
codes. 

4.66 If C C F" is a linear code and w e F", define r = mip. e c S(w, c). Give an 
example of a linear code C C F" and a word w e F" with w £ C such that 
there are distinct codewords c, c' e C with S(w, c) = r = &(x, c'). Conclude that 
coixecting a transmitted word by choosing the codeword nearest to it may not be 
well-defi ned. 

4.67 Let C be an [n , m] -linear code over a fi nite fi eld F, and let G be a generating matrix 
of C. Prove that an m x n matrix A is also a generating matrix of C if and only if 
A = GH for some matrix H e GL(n, F). 
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4.68 Prove that the BCH-code of length m + 1 over F 2 having generating polynomial 
x — 1 is the \m + 1, m]-parity check code. 

4.69 (i) Write x 15 — 1 as a product of irreducible polynomials in F 2 |.t | . 

(ii) Find an irreducible quartic polynomial g(x ) e F 2 [a], and use it to defi ne 
a primitive 15th root of unity f e Fj6. 

(iii) Find a BCFl-code C over F 2 of length 15 having minimum distance 
d(C) >3. 

4.70 Let C be the Reed-Solomon [4, 2]-code in Example 4.121. Decode the received 
word y = (3, 3, 1, !)• 




Fields 


The study of the roots of polynomials is intimately related to the study of fields. 
If f(x) e k[x ], where k is a field, then it is natural to consider the relation 
between k and the larger field E. where E is obtained from k by adjoining all the 
roots of f(x). For example, if E = k, then fix) is a product of linear factors in 
k[x]. We shall see that the pair E and k has a Galois group. Gal (E/k), and that 
this group determines whether there exists a formula for the roots of f(x) which 
generalizes the quadratic formula. 


5.1 Classical Formulas 

Revolutionary events were changing the western world in the early 1500s: the 
printing press had just been invented; trade with Asia and Africa was flourishing; 
Columbus had just discovered the New World; and Martin Luther was challeng- 
ing papal authority. The Reformation and the Renaissance were beginning. 

The Italian peninsula was not one country but a collection of city states with 
many wealthy and cosmopolitan traders. Public mathematics contests, sponsored 
by the dukes of the cities, were an old tradition; there are records from 1225 of 
Leonardo of Pisa (c. 1180-c. 1245), also called Fibonacci, approximating roots 
of x 3 + 2x 2 + lOx — 20 with good accuracy. One of the problems frequently set 
involved finding roots of a given cubic equation 

X 3 + bX 2 + cX + d = 0, 

where a, b, and c were real numbers, usually integers. 1 

1 Around 1074, Omar Khayyam (1048-1123), a Persian mathematician now more famous 
for his poetry, used intersections of conic sections to give geometric constructions of roots of 
cubics. 
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Modern notation did not exist in the early 1500s, and so the feat of finding 
the roots of a cubic involved not only mathematical ingenuity but also an ability 
to surmount linguistic obstacles. Designating variables by letters was invented 
in 1591 by F. Viete (1540-1603) who used consonants to denote constants and 
vowels to denote variables (the modem notation of using letters a,b,c, ... at 
the beginning of the alphabet to denote constants and letters x, y, z at the end of 
the alphabet to denote variables was introduced in 1637 by R. Descartes in his 
book La Geometrie). The exponential notation A 2 , A , A 4 , . . . was essentially 
introduced by J. Hume in 1636 (he used A 11 , A m , A™ , . . . ). The symbols +, — , 
and y, as well as the symbol / for division, as in a/b, were introduced by 
J. Widman in 1486. The symbol x for multiplication was introduced by W. 
Oughtred in 1631, and the symbol + for division by J. H. Rahn in 1659. The 
symbol = was introduced by the Oxford don Robert Recorde in 1557, in his 
Whetstone of Wit: 

And to avoide the tediouse repetition of these woordes: is equal to: 

I will lette as I doe often in woorke use, a paire of paralleles, or 
gemowe lines of one lengthe, thus: =, because noe 2 thynges, can 
be moare equalle. 

( Gemowe is an obsolete word meaning twin or, in this case, parallel.) These 
symbols were not adopted at once, and often there were competing notations. 
Most of this notation did not become universal in Europe until the next century, 
with the publication of Descartes’s La Geometrie. 

Let us return to cubic equations. The lack of good notation was a great 
handicap. For example, the cubic equation X 3 + 2X 2 + 4X —1=0 would be 
given, roughly, as follows: 

Take the cube of a thing, add to it twice the square of the thing, to 
this add 4 times the thing, and this must all be set equal to 1 . 

Complicating matters even more, negative numbers were not accepted; an equa- 
tion of the form X 3 — 2X 2 — 4X +1=0 would only be given in the form 
X 3 + 1 = 2X 2 + 4X. Thus, there were many forms of cubic equations, depend- 
ing (in our notation) on whether coefficients were positive, negative, or zero. 

About 1515, Scipione del Ferro of Bologna discovered a method for finding 
the roots of several forms of a cubic. Given the competitive context, it was 
natural for him to keep his method secret. Before his death in 1526, Scipione 
shared his result with some of his students. 

The following history is from the excellent account in J.-R Tignol, Galois’ 
Theory of Equations. 
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In 1535, Niccolo Fontana (c. 1500-1557), nicknamed “Tartaglia” 
(“Stammerer”), from Brescia, who had dealt with some very partic- 
ular cases of cubic equations, was challenged to a problem-solving 
contest by Antonio Maria Fior, a former pupil of Scipione del Ferro. 

When he heard that Fior had received the solution of cubic equa- 
tions from his master, Tartaglia threw all his energy and skill into the 
struggle. He succeeded in finding the solution just in time to inflict 
upon Fior a humiliating defeat. 

The news that Tartaglia had found the solution of cubic equation 
reached Giralamo Cardano (1501-1576), a very versatile scientist, 
who wrote a number of books on a wide variety of subjects, includ- 
ing medicine, astrology, astronomy, philosophy, and mathematics. 
Cardano then asked Tartaglia to give him his solution, so that he 
could include it in a treatise on arithmetic, but Tartaglia flatly re- 
fused, since he was himself planning to write a book on this topic. 

It turns out that Tartaglia later changed his mind, at least partially, 
since in 1539 he handed on to Cardano the solution of x 3 + qx = r, 
x 3 = qx + r, and a very brief indication of x 3 + r = qx in verses. 

. . . Having received Tartaglia’s poem, Cardano set to work. Not only 
did he find justifications for the formulas, but he also solved all the 
other types of cubics. He then published his results, giving due credit 
to Tartaglia and to del Ferro, in the epoch-making book Ars Magna, 
sive de regulis algebraicis (The Great Art, or the Rules of Algebra). 

Let us now derive the formulas for the roots of polynomials of low degree. 
The usual way to derive the quadratic formula is by “completing the square,” 
which can be taken literally. Consider the quadratic equation x 2 + bx + c = 0 
with b > 0, and view x 2 + bx as the area pictured in Figure 5.1. One completes 



T b 


T b 


Figure 5.1 Completing the Square 
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the square by adding on the corner square having area jb 2 . The area of the 
large square is (x + \b) 2 -, if c + jb 2 > 0, then we have constructed a square 
with sides of length x + \b and area c + ^b 2 . This geometric construction can 
be done algebraically without assuming that certain quantities are non-negative. 
Let f(x) = x 2 + bx + c. 

x 2 + bx + c = x 2 + bx + jb 2 + c — jb 2 
= (x + jb ) 2 + ^( 4 c — b~). 

Therefore, if z is a root of fix), then 

z + \b = b 2 — 4c. 

We now present a different derivation of the quadratic formula which begins 
by replacing a given polynomial by a simpler one. 

Definition. A polynomial fix) e M[x ] of degree n is reduced if it has nor" -1 
term; that is, fix) = a n x" + a„-2x "~ 2 + • — |- go- 

Lemma 5 . 1 . The substitution X = x — changes 

fiX) = X" +a n _ l X n ~ l +hiX), 
where hfX) = 0 or deg(/z) < n — 2 , into a reduced polynomial 

fix) = fix - \a n - 1); 

moreover, if u is a root of fix), then u — y t a n - 1 is a root of fiX). 

Proof. The substitution X = x — y } a n ~ i gives 

fix) = fix - \a n -x) 

= (x - \a n -\) n + a n -\ix - +h{x - \a n - f) 

= ( x n - a n - \x n ~ { +gi(r:)) +a n - i(a ' !_1 + giix)) + hix - \a, ,_i) 

= x" + g\ix) + a n -2giix) + hix - \a n -i), 

where each of gi (a), g2(x), h(x-±a„-i), and g\ix)+a„-2g2(x)+h(x-±a n -i) 
is either 0 or a polynomial of degree < n — 2. Obviously, the polynomial fix) = 
fix — j^a„- 1 ) is reduced. 

Finally, if u is a root of fix), then 0 = f(u) = fin — j : a n - 1 ); that is, 
u — is a root of fiX). • 

Here is another proof of the quadratic formula. 
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Corollary 5.2 (Quadratic Formula). If f(X) = X 2 + bX + c, then its roots 
are 

\{—b ± V b 2 — 4c) . 

Proof. Define x by X = x — ^ b . Now 

/(x) = (x — \b) 2 + b(x — ^ b ) + c. 

The linear terms cancel, the reduced polynomial is 

/(x) = x 2 - \(b 2 -4c), 

and the roots of /(x) are u = \/fi 2 — 4c. But Lemma 5.1 says that the roots 

of f(X) are u — jb\ that is, the roots of f(X ) are \(—b ± \Jb 2 — 4c). • 

The following consequence of the quadratic formula will be used in deriving 
the cubic formula. 

Corollary 5.3. Given numbers c and d, there exist numbers a and f with 
a + fi = c and afi = d. 

Proof If d = 0, choose a = 0 and ft = c. If d 0, then a f=- 0 and we may 
set f = d la. Substituting, c = a + f = a + d la, so that 

a 2 — ca + d = 0. 

The quadratic formula now shows that such an a exists, as does /; = d /a. (Of 
course, a and f> might be complex numbers.) • 

Lemma 5.1 simplifies the original polynomial and, at the same time, keeps 
control of its roots. In particular, if n = 3, then /(x) has the form x 3 + qx + r. 
The “trick” in solving a reduced cubic is to write a root u of x 3 + qx + r as 

u = a + f, 


and then to find a and f>. Now 

"5 

0 = u + qu + r 
= (o' + f) 3 +q(a + f) + r. 

(a + £) 3 = a 3 + 3a 2 f + 3 af 2 + f 3 
= a 3 T f 3 T 3 a/3(a T /l) 
= a 3 + f 3 + 3apu. 


Note that 
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Therefore, 0 = a 3 + p 3 + 3 a/3u + qu + r, and so 

0 = a 3 + p 3 + u{ 3a/3 + q ) + r. (1) 

We have already set a + /3 = w, by Corollary 5.3, we may impose a second 
condition 

afi = -\q (2) 

which makes the u term in Eq. (1) go away, leaving 

a 3 + p 3 = -r. (3) 

Cubing each side of Eq. (2) gives 

or’P 3 = ~2 7<? 3 - (4) 

Equations (3) and (4) in the two unknowns a 3 and ft 3 can be solved, as in Corol- 
lary 5.3. Substituting 2 /3 3 = — q 3 /{21a 3 ) into Eq. (3) gives 



which may be rewritten as 

a 6 + ra 3 — q 3 = 0, (5) 

a quadratic y 2 + ry — ^q 3 in a 3 . The quadratic formula gives 

a 3 = \{-r + s/D), (6) 

where D = r 2 + 4j c l 3 ■ Note that ft 3 is also a root of the quadratic in Eq. (5), so 

that 

p 3=l(_ r _VD), (7) 

Now take a cube root 3 to obtain a. By Eq. (2), p = —q/(3a), and so u = a + P 
has been found. 

What are the other two roots? Theorem 3.49 says that if u is a root of a 
polynomial /(x), then /(x) = (x — u)y(x) for some polynomial g(x). After 
finding one root u = a + P, divide x 3 + qx + r by x — u, and use the quadratic 

2 If a = 0, then q = 0, and the polynomial is f(x) = x 3 + r = 0; of course, the roots in 
this case are the cube roots of —r. 

3 The number z — b(~ r + '/D) might be complex. The easiest way to fi nd a cube root of 

z is to write it in polar form se'® , where s > 0; a cube root is then e' 0 / 3 . 
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formula on the quadratic quotient g(x ) to find the other two roots [any root of 
g(x) is a root of /(*)]. 

Here is an explicit formula for the other two roots of f(x) (instead of the 
method just given for finding them). There are three cube roots of unity, namely, 
l, co = — 4 + i ^ , and co 2 = — ^ — / ^ . It follows that the other cube roots of a 3 
are coa and co 2 a. If /I is the “mate” of a, that is, if ft = —q/(3a), as in Eq. (2), 
then the mate of coa is 

—q/i3coa) = p/co = co 2 p, 

and the mate of co 2 a is 

— q /(3cir a ) = p/co 2 = cop. 

Therefore, explicit formulas for the roots of f (x) are a + P, coa + co 2 p, and 
co 2 a + cop. 

We have proved the cubic formula (also called Cardano’s formula ). 

Theorem 5.4 (Cubic Formula). The roots of x 3 + qx + r are 

a + p, coa + co 2 p, and co 2 a + a>P, 

where a 3 = ^(— r + s/15), p = — -jf, D = r 2 + 4jCl 3 , ar >d co = —j + i is a 
cube root of unity. 

Proof. We have just given the proof when a f 0. By Eq. (2), we have ap = 
—q/3, and so a = 0 forces q = 0; that is, the reduced cubic is x 3 + r. In this 
case, p 3 = —r, the roots are p , a>p, and co 2 p. and the cubic formula holds in this 
case as well. • 

Recall that Eq. (7) gives p 3 = ^(— r — V^D). 

Example 5.5 (Good Example). 

We find the roots of x 3 — 15x — 126. The polynomial is already reduced, for there 
is no x 2 term, and so it is in the form to which the cubic formula applies (were 
it not reduced, one would first reduce it, as in Lemma 5.1). Here, q = —15, 
r = —126, D = (— 126) 2 + ^(— 15) 3 = 15,376, and sJ~D = 124. Hence, a 3 = 
4 [ — ( — 126) + 124] = 125 and a = 5, while p = -q/3a = 15/(3 -5) = 1. The 
roots are thus a + P = 6, coa +co 2 p = —3 + 2 i \/3, and ora + coP = —3 — 2 i \/3. 
Alternatively, having found u = 6 to be a root, the division algorithm gives 

x 3 — 15x — 126 = (x — 6)(x 2 + 6x + 21), 

and the quadratic formula gives —3 + 2/ V3 as the roots of the quadratic factor. 

◄ 
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The quadratic and cubic formulas are not valid over arbitrary coefficient 
fields. For example, since 2 = 0 in fields k of characteristic 2, the quadratic 
formula does not make sense for quadratics in k[x ] because 4 is not defined. 
Similarly, the cubic formula (and the quartic formula below) does not apply to 
polynomials with coefficients in fields of characteristic 2 or characteristic 3 be- 
cause the formulas involve \ and one of which is not defined in these fields. 

Definition. If u, v, and w are the roots of /(x) = x 3 + qx + r, let A = 
(u — v)(u — w)(v — w), and define 

A 2 = [{u — v)(u — w)(v — in)] 2 ; 

the number A 2 is called the discriminant 4 of /(x). 

It is natural to consider A 2 instead of A, for A is a number depending not 
only on the roots but also on the order in which they are listed. Had we listed the 
roots as u, w, v, for example, then (u — w)(u — v)(u> — v) = — A, because the 
factor w — v = — (v — w) has changed sign. Squaring eliminates this difference. 

Note that if A 2 = 0, then A = 0 and the cubic has a repeated root. Can 
we detect this without first computing the roots? The cubic formula allows us to 
compute A 2 in terms of q and r. 

Lemma 5.6. The discriminant A 2 of f(x) = x 3 + qx + r is 
A 2 = —27 r 2 - 4 q 3 = -27 D. 

Proof. If the roots of /(x) are u, v, and w, then the cubic formula gives 

u = a + ft; v = coa + co p; w = co~a + cof , 

where co = — i D = r : + 4jq 3 , a = [\{—r + v/T))] 1 / 3 , and ft = 

One checks easily that: 

9 9 

u — v = a + p — coa — co /3 = (1 — co)(a — co~P)\ 

9 9 

u — w = a + p — co~a — cof = —co (1 — co)(a — cof)\ 

9 9 

v — w = coa + cw p — co a — co/3 = cod — co)(a — ft). 

Therefore, 

A = — cu 3 (l — w) 3 (a — P) (a — coP)(a — co 2 p). 

3 More generally, if /(x) = (x — ut)(x — uf) . . . (x — u n ) is a polynomial of degree n. then 
the discriminant of /(x) is defined to be A 2 , where A = Y\i<j( u i ~ u f l one takes i < j 
so that each difference w ; - — u; occurs just once in the product). In particular, the quadratic 
formula shows that the discriminant of x 2 + bx + c is b~ — Ac. 
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Of course, —oo 2 = — 1, while 


(1 — oo ) 3 = 1 — 3ao + 3 a) 2 — oo 2 = —3 (a) — or). 

But &> = — 3 + and oo 2 = 777 = — i^-, so that (1 — oo) 2 = 

— 3(cu — oo 2 ) = —i 3\/3, and 


-m 3 (l - «) 3 = /3V3. 

Finally, Exercise 3.78 on page 278 gives 

(a - P)(a - cop) (a - oo 2 p) = a 2 - p 2 = Vd. 

Therefore, A = i3*j3\f~D, and 

A 2 = -27 D = -21r 2 - 4q 2 . • 

It follows, for example, that the cubic formula is not needed to see that the 
cubic f{x) = x 3 — 3x + 2 has a repeated root, for —21r 2 — 4 q 3 = 0. It also 
follows that if fix) e k[x], then its discriminant lies in k as well. 

We are now going to use the discriminant to detect whether the roots of a 
cubic are all real. 


Lemma 5.7. Every fix) £ M[x] of odd degree has a real root. 


Remark. The proof we give assumes that fix) has a complex root (which 
follows from the Fundamental Theorem of Algebra). A 

Proof The proof is by induction on n > 0, where deg if) = In + 1. The base 
step n = 0 is obviously true. Let n > 1 and let u be a complex root of fix). If u 
is real, we are done. Otherwise u = a +ib, and Exercise 5.7 on page 447 shows 
that the complex conjugate 17 = a — ib is also a root; moreover, u f IT because 
u is not real. Both x — u and v — 17 arc divisors of fix); as these divisors are 
relatively prime, their product is also a divisor; there is a factorization in C[x | 

fix) = (x - u)(x - 17)gix). 

Now (x — u ) (x — 17) = x 2 — lax + a 2 + b 2 e M[x], and so the division algorithm 
gives gix) = fix)/ix — u)ix —17)e lR[x]. Since deg(g) = (In + 1) — 2 = 
2 in — 1) + 1, the inductive hypothesis says that g (x ) , and hence fix), has a real 
root. • 
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Theorem 5.8. All the roots u, v, w ofx 3 + qx + r e M[a] are real numbers if 
and only if the discriminant A 2 > 0; that is, 27 r 2 + 4 q 2 < 0. 

Proof If u, v, and w are real numbers, then A = (w — v)(u — w)(v — w ) is a 
real number. Therefore, —Hr 2 — 4q 2 = A 2 > 0, and 27 r 2 + rq 2 < 0. 

Conversely, assume that w = s + ti is not real (i.e., t f=- 0); by Exercise 5.7 
on page 447, the complex conjugate of a root is also a root, say, v = s — ti - , by 
Lemma 5.7, the other root u is real. Now 

A = (n — s + ti)(u — s — ti)(s — ti — [s + ti]) 

= (-2 t i)[(u - s) 2 + t 2 )]. 

Since u, s, and t are real numbers, 

A 2 = (—2 ti) 2 [(u - s) 2 + 1 2 )] 2 
= 4 t 2 i 2 [(u - s) 2 + t 2 )] 2 
= -4t 2 \(u - s) 2 + t 2 )] 2 < 0, 

and so 0 > A 2 = —21r 2 — 4 q 2 . We have shown that if there is a nonreal root, 
then 27 r 2 + 4 q 2 > 0; equivalently, if there is no nonreal root (i.e., if all the roots 
are real), then 27 r 2 + 4 q 2 <0. • 

Example 5.9 (Bad Example). 

In Example 5.5, the cubic formula gave the roots of a 3 — 1 5x — 126 in a routine 
way. Let us now try the cubic formula on the polynomial 

a 3 — lx + 6 = (a — 1)(a — 2)(a + 3) 

whose roots are, obviously, 1, 2, and —3. There is no a 2 term, q = —7, r = 6, 
and D = r 2 + jjq 2 = — ^ < 0 (notice that 27 D = Hr 2 + 4g 3 is negative, as 
Theorem 5.8 predicts). The cubic formula gives a messy answer: the roots are 

a + f, coa + co~j3, co"a + co/3, 

where a 2 = \(—6 + J-^rj - ) and f 2 = j(—6 — J — ^). Something strange 
has happened. There are three curious equations saying that each of 1, 2, and —3 
is equal to one of the messy expressions displayed above; thus. 



is equal to 1, 2, or —3. Aside from the complex cube roots of unity, this expres- 
sion involves square roots of the negative number —4^. 
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Until the Middle Ages, mathematicians had no difficulty in ignoring square 
roots of negative numbers when dealing with quadratic equations. For example, 
consider the problem of finding the sides x and y of a rectangle having area A 
and perimeter p. The equations 

x_y = A and 2x + 2y = p 

lead to the quadratic equation 2x 2 — px + 2A = 0, and, as in Corollary 5.3, the 
quadratic formula gives the roots 

x = \(p± V P 2 ~ 16A ) • 

If p 2 - 16 A > 0, one has found x (and y); if p 2 — 16 A < 0, one merely says 
that there is no rectangle whose perimeter and area are in this relation. But the 
cubic formula does not allow one to discard “imaginary” roots, for we have just 
seen that an “honest” real and positive root, even a positive integer, can appear 
in terms of complex numbers. 5 The Pythagoreans in ancient Greece considered 
number to mean positive integer. By the Middle Ages, number came to mean 
positive real number (although there was little understanding of what real num- 
bers are). The importance of the cubic formula in the history of mathematics is 
that it forced mathematicians to take both complex numbers and negative num- 
bers seriously. 

The physicist R. P. Feynman (1918-1988), one of the first winners of the 
annual Putnam national mathematics competition (and also a Nobel laureate in 
physics), suggested another possible value of the cubic formula. As mentioned 
earlier, the cubic formula was found in 1515, a time of great change. One of 
the factors contributing to the Dark Ages was an almost slavish worship of the 
classical Greek and Roman civilizations. It was believed that that earlier era 
had been the high point of man’s accomplishments; contemporary man was in- 
ferior to his forebears (a world view opposite to the modern one of continual 
progress!). The cubic formula was essentially the first instance of a mathemat- 
ical formula unknown to the ancients, and so it may well have been a powerful 
example showing that sixteenth-century man was the equal of his ancestors. ◄ 

The quartic formula, discovered by Lodovici Ferrari (1522-1565) in the 
early 1540s, also appeared in Cardano’s book, but it was given much less at- 
tention there than the cubic formula. The reason given by Cardano is that cu- 
bic polynomials have an interpretation as volumes, whereas quartic polynomials 
have no such obvious justification. Cardano wrote, 

5 We saw a similar phenomenon in Theorem 1.12: the integer terms of the Fibonacci se- 
quence are given in terms of \/5. 
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As the first power refers to a line, the square to a surface, and the 
cube to a solid body, it would be very foolish for us to go beyond 
this point. Nature does not permit it. Thus, . . . , all those matters up 
to and including the cubic are fully demonstrated, but for the others 
which we will add, we do not go beyond barely setting out. 

We present the derivation of the quartic formula given by Descartes. 


Theorem 5.10 (Quartic Formula). There is a method to compute the four 
roots of a quartic 

X 4 + bX 3 + cX 2 + dX + e. 

Proof As with the cubic, the quartic can be simplified, by setting X = x — \b, 
to 

x 4 + qx 2 + rx + s\ (8) 

moreover, if a number u is a root of the second polynomial, then u = jb is a 
root of the first. 

Factor the quartic in Eq. (8) into quadratics: 

x 4 + qx 2 + rx + s = (x 2 + jx + l)(x 2 — jx + m) (9) 


(the coefficient of x in the second factor is —j because the quartic has no a 3 
term). If j, t, and m can be found, then the quadratic formula can be used to find 
the roots of the quartic in Eq. (8). 

Expanding the right-hand side of Eq. (9) and equating coefficients of like 
terms gives the equations 


m + l — j 2 

= q 

j (m - i) 

_ r 

lm 

= s. 


( 10 ) 


Adding and subtracting the top two equations in Eqs. (10) yield 

1 2 m = j 2 + q+r/j\ 

[21 =j 2 + q-r/j. 

Now substitute these into the bottom equation of Eqs. (10): 

4s = 4 lm = (j 2 +q + r/j)(j 2 + q - r/j) 

= (j 2 + q) 2 - r 2 /j 2 

= j 4 + 2j 2 q + q 2 -r 2 /j 2 . 
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Clearing denominators and transposing gives 

j 6 + 2qj 4 + (q 2 -4s)j 2 -r 2 = 0, (12) 

a cubic equation in j 2 . The cubic formula allows one to solve for j , and one 
then finds i and m using Eqs. (11). • 

Example 5.11. 

Consider 

x 4 - 2x 2 + 8x - 3 = 0, 

so that q = —2, r = 8, and s = —3. If we factor this quartic into 
(x 2 + jx + £)(x 2 — jx + m), 

then Eq. (12) gives 

j 6 — 4y 4 + 16 y 2 — 64 = 0. 

One could use the cubic formula to find j 2 , but this would be very tedious, for 
one must first get rid of the j 4 term before doing the rest of the calculations. It 
is simpler, in this case, to observe that j = 2 is a root, for the equation can be 
rewritten 

j 6 ~ 4/ 4 + 16/ 2 - 64 = j 6 - 2 2 j 4 + 2 4 / 2 - 2 6 = 0 

(many elementary texts are fond of saying, in such circumstances, that j = 2 is 
found “by inspection”). We now find l and m using Eqs. (11). 

21 = 4 — 2 + (8/2) = 6 
2m = 4 - 2 - (8/2) = -2. 

Thus, the original quartic factors into 

(x 2 — 2x + 3)(x 2 + 2x — 1). 

The quadratic formula now gives the roots of the quartic: 

-1 + «V2, — 1 — i V2, l + is/2, 1 -is/2. ◄ 

Do not be misled by this example; it is difficult to find a quartic whose roots 
are given by the quartic formula in recognizable form. For example, the quartic 
formula gives complicated versions of the roots of x 4 — 25x 2 + 60x — 36 = 
(x — l)(x — 2)(x — 3)(x + 6), as the reader may check. 

It is now very tempting, as it was for our ancestors, to seek the roots of a 
quintic g(X ) = X 5 + bX 4 + cX 3 + dX 2 + eX + /. Begin by reducing the 
polynomial with the substitution X = x — i/?. It is natural to expect that some 
further ingenious substitution together with the formulas for the roots of lower 
degree polynomials will yield the roots of g(X). But quintics resisted all such 
attempts for almost 300 years. We shall continue this story later in this chapter. 
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Viete’s Cubic Formula 

A formula involving extraction of roots is not necessarily the simplest way to 
find the roots of a cubic. We shall now give another formula for the roots of 
a 3 + qx + r, due to Viete, which replaces the operations of extraction of roots 
(which are, after all, “infinitary” in the sense that their evaluation requires limits) 
by evaluation of cosines. By Corollary 1.23, we have 

cos(30) = 4 cos 3 9 — 3 cos 9. 

It follows that one root of the cubic 

y 3 ;jcos(360 (13) 

is u = cos 9. By Exercise 5.9 on page 447, the other two roots of this particular 
cubic are u = cos (9 + 120°) and u = cos (9 + 240°). 

Now let f{x) = v 3 + qx + r he a cubic all of whose roots are real (Theo- 
rem 5.8 gives a way of checking when this is the case). We want to force f(x ) 
to look like Eq. (13). If v is a root of f(x), set 

v = tu, 

where t and u will be chosen 6 in a moment. Substituting, 

0 = f(tu) = ru 3 + qtu + r, 

and so 

u 3 + ( q/t 2 )u + r/t 3 = 0; 

that is, u is a root of g(y) = y 3 + (q /t 2 )y + r/t 3 . If we can choose t so that 

q/t 2 = ~l (14) 

and 

r/t 3 = — | cos(30) (15) 

for some 9, then g(y) = y 3 — \y — \ cos(3$) and its roots are 

u = cos 9, u = cos (9 + 120°), u = cos (9 + 240°). 

But if u 3 + (q/t 2 )u + r/t 3 = 0, then t 3 u 3 +qtu + r =0; that is, the roots v = tu 
of f(x) = x 3 + qx + r = 0 are 

v = tu = tcos9, v = t cos(9 + 120°), v = t cos(0 + 240°). 

6 Scipio’s trick writes a root as a sum a + ft, while Vi'ete’s trick writes it as a product. 
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We now find t and u. Eq. (14) gives t 2 = —Aq/3, and so 

t = y/-4q/3. (16) 

One immediate consequence of Theorem 5.8, 27r 2 + rq 2 < 0, is 

4 q 2 < -27 r 2 ; 


as the right side is negative, q must also be negative. Therefore —4 q /3 is posi- 
tive, and sf—Aq /3 is a real number. Eq. (15) gives 

cos(30) = —Ar/t 2 , 

and this determines 0 if | — 4r/f 3 | < 1. Since 27 r 2 < —4 q 2 , we have 9r 2 /q 2 < 
—4 q /3; taking square roots, 


3r 

q 


< 



= t, 


because Eq. (16) gives t = -J—Aq/3. Now t 2 = —Aq/3, and so 


— 4r 


— 4r 


3r 1 

f 3 


(— 4?/3)f 


q t 


as desired. We have proved the following theorem. 


Theorem 5.12 (Viete). Let f{x ) = x 2 + qx + r be a cubic polynomial for 
which Hr 2 + Aq 2 < 0. If t = ^J—Aq/3 and cos 3 0 = — 4r/f 3 , then the roots of 
fix) are 

tcosO, t cos(# + 120°), and t cos(0 + 240°). 


Example 5.13. 

Consider once again the cubic x 3 — lx + 6 = (x — I )(x — 2)(x + 3) that 
was discussed in Example 5.9; of course, its roots are 1, 2, and —3. The cubic 
formula gave rather complicated expressions for these roots in terms of cube 

roots of complex numbers involving J — Let us now find the roots using 

Theorem 5.12 (which applies because 27 r 2 + 4g 3 = —400 < 0). We first 
compute t and 9 : 

t = J-Aq/3 = V— 4(— 7)/3 = y/2%/3 « 3.055 

and 

cos(3<9) = -Ar/t 3 « -24/(3.055) 3 « -.842; 
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since cos(30) ~ —.842, a trigonometric table gives 39 ~ 148° and 

9 49°. 

The roots of the cubic equation are, approximately, 

3.055 cos49°, 3.055 cos 169°, and 3.055 cos289°. 

These are good approximations to the true answers. Using a trigonometric table 
once again, we find that 

cos 49° «a .656 and 3.055 cos49° «a 2.004 «a 2.00; 
cos 169° «a -.982 and 3.055 cos 169° «a -3.00; 

cos 289° sa .326 and 3.055cos289° «a .996 «a 1.00. ◄ 

Remark. By Lemma 5.7, every cubic f(x ) e M[.v] has a real root; a variation 
of the proof of Viete’s theorem shows how to find it in case f(x) has complex 
roots; that is, when the discriminant condition is 

—4 q 3 < Hr 2 . 

Recall the hyperbolic functions 

coshd = Tf(e e + e~ e ) 

and 

sinhd = \(e e — e ~ e ). 

One can prove that cosh 9 > 1 for all 0. while sinh 9 can take on any real number 
as a value. 

These functions satisfy cubic equations (see Exercise 5.8 on page 447): 
cosh(30) = 4coslr (#) — 3cosh(0) 

and 

sinh(3$) = 4 sinh 3 (9) + 3 sinh(0). 

From the first of these cubic equations, we see that h(y) = _y 3 — | y — | cosh(30) 
has u = cosh($) as a root. To force f(x ) = x 3 + qx + r to look like h(y), we 
write the real root v of f(x) as v = tu. As in the proof of Viete’s theorem, we 
want t 2 = — 4r? /3 andcosh(30) = — 4r/f 3 . 

If —4^/3 > 0, then t is real. Using the discriminant condition —4 q 3 < 27 r 2 , 
one can show that —4 r/f 3 > 1, and so there is a number cp with cosh(<p) = 
— 4r/f 3 . It follows that the real root of f(x) is given by 


v = t cosh(<p/3), 
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where t = s /— Aq/3 . [Of course, the other two (complex) roots of fix) are the 
roots of the quadratic f(x)/(x — v).] 

If —4q /3 < 0, then we use the hyperbolic sine. We know that sinh($) is a 
root of k(y) = y 3 + | v — \ sinh(30). To force fix) to look like kiy), we write 
the real root v of f(x) as v = tu, where t = s/Aq/3 (our present hypothesis 
gives Aq/3 > 0) and sinh(3$) = — 4r/f 3 . As we remarked earlier, there exists a 
number y with sinh(y) = — 4r/ f 3 , and so the real root of f(x) in this case is 

v = t sinh(y/3). ◄ 

Exercises 

5.1 Assume that 0 < 3a < 360°. 

(i) If cos 3a is positive, show that there is an acute angle /3 with 3a = 3/1 or 
3a = 3 (f) + 90°), and that the sets of numbers 

cos /3, cos(/3 + 120°), cos if) + 240°) 

and 

cos(f) + 90°), cost^ + 210°), cosOS + 330°) 

coincide. 

(ii) If cos 3a is negative, there is an acute angle fi with 3a = 3(/S + 30°) or 
3a = 3 (fi + 60°), and that the sets of numbers 

cos(f) + 30°), cost^ + 150°), cos(y6 + 270°) 

and 

cos(f) + 60°), cos (/3 + 180°), cos(yS + 270°) 

coincide. 

5.2 (i) Find the roots of fix) = x 3 — 3x + 1. 

(ii) Find the roots of /(x) = x 3 — 9x + 28. Answer: —4, 2 ± i s/3. 

(iii) Find the roots of /(x) = x 3 — 24x 2 — 24x — 25. Answer: 17, — ^ ± i^-. 

5.3 (i) Find the roots of /(x) = x 3 — 15x — 4 using the cubic formula. Answer: 

g = s/2 + sf — 121 and h = s/2 — sf— 121. 

(ii) Find the roots of fix) using the trigonometric formula. Answer: 4, — 2± 

V3. 

5.4 Find the roots of fix) = x 3 — 6x + 4. Answer: 2, — 1 ± s/3. 

5.5 Find the roots of x 4 — 15x 2 — 20x — 6. Answer: —3, — 1, 2 ± s/6. 

*5.6 The following castle problem appears in an old Chinese text; it was solved by the 
mathematician Qin Jiush'ao in 1247. 

There is a circular castle whose diameter is unknown; it is provided 
with four gates, and two lengths out of the north gate there is a large 
tree, which is visible from a point six lengths east of the south gate. 

What is the length of the diameter? 
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Figure 5.2 The Castle Problem 


(i) Prove that the radius r of the castle is a root of the cubic A 3 + X 2 — 36. 

(ii) Show that one root of f(X) = X 3 + X 2 — 36 is an integer and fi nd the 
other two roots. Compare your method with Cardano’s formula and with 
the trigonometric solution. 

*5.7 Show that if u is a root of a polynomial /(a ) e R[a], then the complex conjugate 
u is also a root of /(a). 

*5.8 (i) Prove that cosh (36) = 4 cosh 3 (6*) — 3 cosh (6*). 

(ii) Prove that sinh(30) = 4 sinh 3 (0) + 3 sinh(0). 

*5.9 Show that if cos 3 9 = r, then the roots of 4a 3 — 3x — r are 

cos 9, cos((9 + 120°), and cos (0 + 240°). 

5.10 Find the roots of a 3 — 9x + 28. 

5.11 Find the roots of a 3 — 24a 2 — 24a — 25. 

5.12 (i) Find the roots of a 3 — 15a — 4 using the cubic formula. 

(ii) Find the roots using the trigonometric formula. 

5.13 Find the roots of a 3 — 6a + 4. 

5.14 Find the roots of a 4 — 15a 2 — 20a — 6. 


5.2 Insolvability of the General Quintic 

For almost 300 years, mathematicians sought some generalization of the quad- 
ratic, cubic, and quartic formulas that would give the roots of any polynomial. 
Finally, P. Ruffini (1765-1822), in 1799, and N. H. Abel (1802-1829), in 1824, 
proved that no such formula exists for the general quintic polynomial (both 
proofs had some gaps, but Abel’s proof was accepted by his contemporaries and 
Ruffini’s proof was not). Just before his untimely death, E. Galois (1811-1832) 
was able to determine precisely those polynomials whose roots can be found by 
a formula involving square, cube, fourth, . . . roots of numbers as well as the 
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usual field operations of adding, subtracting, multiplying, and dividing. In so 
doing, he also founded the Theory of Groups. 

If / (x) e k[x~\ is a monic polynomial, where k is a field containing the roots 
Zi, Z2, ■ ■ ■ , Zn (with possible repetitions), then 

f(x ) = x n + a„- ix" _1 H b a\x + ao = (x - z\) . . . (x — z n )- 

By induction on n > 1, one can easily generalize Exercise 3.99 on page 305: 


tin— 1 

= ~ £/ Zi 

tin— 2 

£z'<y' ZiZj 

ttn— 3 

2 ~li<j<k ZiZjZk 

ao 

= (- \) n Z\Z2---Zn • 


Notice that — o„_| is the sum of the roots and that ±«o is the product of the 
roots. Given the coefficients of /(x), can one find its roots; that is, given the 
a’s, can one solve the system (11) of n equations in n unknowns? If n = 2, the 
answer is “yes”: the quadratic formula works (this is precisely Corollary 5.3). If 
n = 3 or 4, the answer is still “yes,” for the cubic and quartic formulas work. 
But if n > 5, we shall see that no analogous solution exists. 

We did not say that no solution of system (1) exists if n > 5; we said that 
no solution analogous to the solutions of the classical formulas exists. We have 
already seen that the classical Greek problems are impossible to solve if we limit 
ourselves to using particular tools in a particular way; but these problems can be 
solved if we relax the restrictions (for example, we have seen how Archimedes 
trisected angles). Similarly, it is quite possible that there is some way of finding 
the roots of a polynomial if one does not limit oneself to field operations and ex- 
traction of roots only. For example, we have seen Viete’s trigonometric solution 
of the cubic. Indeed, one can find the roots of any /(x) e M[x ] by Newton’s 
method: If r is a real root of a polynomial /(x) and if xo is a “good” approxima- 
tion to r, then r = lim^^ooX^, where one defines x„+i = x n — f(x n )/f'(x n ). 
There is a method of Hermite finding roots of quintics using elliptic modular 
functions, and there are methods for finding the roots of many polynomials of 
higher degree using hypergeometric functions. We are going to show, if n > 5, 
that finding roots “by radicals” is not always possible. 

Let us remind the reader of several definitions and propositions from earlier 
chapters. If k is a subfield of a field K, then we also say that K is an extension of 
k. We abbreviate this by writing “K / k is an extension.” If K / k is an extension, 
then K may be regarded as a vector space over k, as in Example 4.1(iii). One 
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says that K is a finite extension of k if K is a finite-dimensional vector space 
over k. The dimension of K, denoted by [ K : k\, is called the degree of K/k. 


Example 5.14. 

Let p(x) e k[x | be an irreducible polynomial of degree n. where k is a field, 
and let k(z)/k be an extension obtained by adjoining a root z of pix). Proposi- 
tion 3.1 16(iv) says that each element in k(z) has a unique expression of the form 
bo + b\z + • • • + b n -iz”-\ where b ,■ e k. Thus, the list 1, z, z 2 , . . . , z n ~ x is a 
basis of k(z)/k, and so dim(k(z)) = n = deg(p). A 


Theorem 4.31. Let k c K c E be fields, with K a finite extension ofk and E 
a finite extension of K. Then E is a finite extension ofk, and 

[E : k] = [E : K][K : k] 


Definition. Assume that K Ik is an extension and that z e K. We call z alge- 
braic over k if there is some nonzero polynomial f (x) e k[x | having z as a root; 
otherwise, z is called transcendental over k. 

In Chapter 3, we considered adjoining one element to a field, examining 
k(z) in some detail. Let us now generalize this construction of adjoining one 
element to a field to adjoining a set of elements to a field. This will be especially 
interesting when we adjoin the set of all roots of a given polynomial. 


Definition. Let k be a subfield of a field K and let {zi, ■ ■ ■ , z n ] be a sub- 
set of K. The subfield of K obtained by adjoining z.\, , z n to k, denoted 
by k(zi, ■ ■ ■ , Zn ), is the intersection of all the subfields of K containing k and 

Zl , • • • i Zn - 

Of course, k{z \, . . . , z n ) is the smallest subfield of K containing k and all 
the Zi in the sense that if S is any other subfield of K containing k and the Zi , 
then k(zi , . . . , z n ) Q S. 

Proposition 4.32. If K/k is a finite extension, then every z £ K is algebraic 
overk. Conversely, if K = £(zi, . . . , z n )> an d if each Zi is algebraic over k, then 
K /k is a finite extension. 

By Kronecker’s theorem, given fix) e k[x\, where k is a field, there is an 
extension K / k containing all the roots of /(x); that is, the polynomial /(x) is a 
product of linear factors in K[x\. 
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Definition. Let k be a subfield of a field K, and let /(x ) e k[x |. We say that 
fix) splits over K if 


fix) = a(x - zi) . . . (x - Zn), 

where zi , ■ ■ . , z n are in K and a e k is nonzero. 

An extension E/k is called a splitting field of fix) over k if fix) splits over 
E, but fix) does not split over any proper subfield of E. 

Example 5.15. 

Let m > 1, let k be a field, and let fix) = x m — 1 e k[x ]. By Kronecker’s 
theorem, there is an extension K/k over which fix) splits. The roots of fix) 
are, of course, the mth roots of unity. Recall Theorem 3.122, which says that K 
contains a primitive mth root of unity; that is, there is some mth root of unity, 
say, z e K, with every mth root of unity being a power of z. 

Let p be a prime, and consider gix) = x p — 1. \fk has characteristic f p, 
then g(x) has no repeated roots [by Exercise 3.63 on page 271, g(x) has no 
repeated roots if and only if ig , g') = 1, where g'ix) is the derivative of g (x ) ] . 
On the other hand, if k has characteristic p. then x p — 1 = (x — \) p , and so there 
is only one pth root of unity, namely, 1 . 

Now consider hix) = x p — a e k[x\, and let k(u) be the extension ob- 
tained from k by adjoining u, where u p = a. If k has characteristic f p 
and if k contains the pth roots of unity, then we claim that kin) is a splitting 
field of h)x) over k. If z is a primitive root of unity, then the roots of h (x ) are 
u, zu, z 2 u, . . . , z p ~ 1 u. Therefore, kin) is a splitting field of hix) over k. On the 
other hand, if k has characteristic p, then hix) = x p —a = x p — u p = (x — u) p . 
Thus, there is only one root of hix), and so k ( u ) is a splitting field of h (x ) over 
k in this case as well. A 

Proposition 5.16. If fix) £ k[x\, where k is afield, then a splitting field E/k 
of fix) exists. 

Proof. By Kronecker’s theorem, Theorem 3.1 18, there exists an extension K / k 
with fix) = a(x — zi) • • • (x — z n ) in K[x ]. If we define E = kiz \, . . . , z n ), 
where zi, ■ . . , z n are the roots of fix), then fix) splits over E. If B C £ is 
a proper subfield of E, then some Zi f B, and so fix) does not split over B. 
Therefore, £ is a splitting field of fix). • 

A splitting field of fix) e k[x ] is the smallest subfield £ of K containing k 
and all the roots of fix). For example, consider fix) = x 2 + 1 e Q[x]. The 
roots of fix) are ±i, and so fix) splits over C; that is, fix) = (x — i)ix + i) 
is a product of linear polynomials in C[x]. However, C is not a splitting field 
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because C is not the smallest field containing Q and all the roots of /(x); here, 
Q(i) is a splitting field. 

The reason we say “a” splitting field instead of “the” splitting field is that the 
definition involves not only /(x) and k. but the larger field K as well. If /(x) 
splits in K [x ] , where K/k is a field extension, then the proof of Proposition 5.16 
shows that there is a unique splitting field £ of /'(x) contained in K, namely, 
E = k(zi , . . . , z n ). However, if no such extension K is given, then splitting 
fields may be distinct. In Theorem 5.23, we shall see that any two splitting fields 
of /(x) over k are, in fact, isomorphic. Analysis of this technical point will 
enable us to prove that any two finite fields with the same number of elements 
are isomorphic. 

Example 5.17. 

Let E = F(y\, . . y n ) be the field of all rational functions in n variables 
yi, . . . , y n with coefficients in a field F; that is, E = Frac(F[yi, . . . , y„J), 
the fraction field of the polynomial ring in n variables. The coefficients of 
fix') = (x — yi)(x — yf ) . . . (x —y n ), which we denote by a,-, are given explicitly 
in terms of the y’s by Eqs. (1) on page 448. Define k = F(ao , . . . , a n -\). Notice 
that £ is a splitting field of /(x) over k, for it arises from k by adjoining to it all 
the roots of /(x), namely, all the y’s. ◄ 

Definition. Let £ be a field containing a sub field k. An automorphism 1 of £ is 
an isomorphism cr: £ -> £;we say that a fixes k if a (a) = a for every a e k. 

Remark. If E /k is a field extension, then Example 4. 1 (iii) shows that £ is a 
vector space over k. If o : £ — »• E is an automorphism fixing k. then a is a linear 
transformation. Clearly, cr(z + z!) = <r(z) + cr(z') for all z, z! e £, but a also 
preserves scalar multiplication: if a e k, then 

cr(az) = cr(fl)cr(z) = aa(z), 


because a fixes k. ◄ 

We have seen that a splitting field of x 2 + 1 € Q[x] is £ = Q(i). Complex 
conjugation a : a a is an example of an automorphism of £ fixing Q. 

Proposition 5.18. Let k be a subfield of a field K, let 

f(x) = x n + a„_ix ra_1 + • • • + «ix + flo £ *[x], 

7 The word automorphism is made up of two Greek roots: auto meaning ‘keif" and morph 
meaning ‘Shape” or ‘form.” Just as an isomorphism carries one group onto an identical 
replica, an automorphism carries a group onto itself. 
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and let E = k{z \ , . . . , z n ) be a splitting field. If a : E — > E is an automorphism 
fixing k, then o permutes the roots z\, . . . , Z n of fix). 

Proof. If z is a root of fix), then 

0 = f(z) = z n + 4 l-fliz + flo- 

Applying a to this equation gives 

0 = cr(z) n + cr(a ;1 _i)cr(z)" _1 H f- a(a\)a(z) + cr(a 0 ) 

= cr(z) n + a„_icr(z)” _1 4 F ai<x(z) + a 0 , 

because a fixes k. Therefore, a (z) is a root of f(x ) ; if Z is the set of all the roots, 
then o' : Z — > Z, where er' is the restriction er |Z. But o' is injective (because a 
is), so that Exercise 2.12 on page 102 says that o' is a permutation. • 


Corollary 5.19. Let k C B c F be a tower of fields, where B is the splitting 
field of some polynomial f(x) e k[x\. If o : F — > F is an automorphism 
fixing k, then o(B ) = B. 

Proof. By Proposition 5.18, o permutes the roots zi, . . . , z n of f{x), so that 
o(B) c 5. As vector spaces over k, we have B = o{B), for o is an injective 
linear transformation. Since [B : k\ < oo, by Exercise 5.23 on page 468, both B 
and o(B) are finite-dimensional and dim(B) = dini(a (/()). Corollary 4.25(h) 
now gives B = o(B). • 

The following proposition will be useful. 

Proposition 5.20. Let E = k(z l, . . . , z»). If o : E — > E is an automorphism 
fixing k and ifo(zi) = Zi for all i, then o is the identity. 

Proof. We prove the proposition by induction on n > 1 . If n = 1 , then each 
u e E has the form /(zi)/g(zi), where f(x), g(x) e k[x\ andg(zi) f=0. Butcr 
fixes Zi as well as the coefficients of /(x) and of g(x), so that o fixes all u e E. 
For the inductive step, write K = k(z\, . ■ . , z. n -\ ), and note that E = K (z. n ) [for 
K (Zn) is the smallest subfield containing k and zi, . . . , z n - 1 > Z n I- The inductive 
step is just a repetition of the base step with k replaced by K. • 


Definition. Let k be a subfield of a field E. The Galois group of E over k, 
denoted by C>'d\(E / k), is the set of all those automorphisms of E that fix k. If 
fix) e k[x], and if E = k(zi , . . . , z«) is a splitting field, then the Galois group 
of f{x) over k is defined to be G&\{E /k). 
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It is easy to check that Gal (E/k) is a group with operation composition of 
functions. This definition is due to E. Artin (1898- 1962), in keeping with his 
and E. Noether’s emphasis on “abstract” algebra. Galois’s original version (a 
group isomorphic to this one) was phrased, not in terms of automorphisms, but 
in terms of certain permutations of the roots of a polynomial. 

If f(x ) = x 2 + 1 e Q[x ], then complex conjugation a is an automorphism 
of its splitting held Q(i), which fixes Q (it interchanges the roots i and — i). 
Since Gal(Q(/)/Q) is a subgroup of the symmetric group S 2 , which has order 
2, it follows that Gal(Q(/)/Q) = (<r) = 1 2 . One should regard the elements of 
Gal (ft / k) as generalizations of complex conjugation. 


Theorem 5.21. If f{x) e k[x ] has degree n, then its Galois group Gal (E/k) 
is isomorphic to a subgroup of S n . 

Proof Let E/k be a splitting held of f{x) over k, and let X = {zi , . . . , z n } 
be the set of roots of f(x) in E. If a e Gal (E/k), then Proposition 5.18 
shows that its restriction a\X is a permutation of X; that is, er|X e S x ■ De- 
hne <p: Gal (E/k) -» Sx by <p\ a i-> er|X. To see that (p is a homomor- 
phism, note that both ipicrr) and <p(o)(p(x) are functions X — > X, and hence 
they are equal if they agree on each Zi e X. But (p(or)\ n f-> (err )(zf), while 
: Zi f-> a(r(z;)), and these are the same. 

The image of <p is a subgroup of Sx = S m , where m = |X| < n (if f (x) has 
repeated roots, then m < n). The kernel of (p is the set of all a e Gal (E/k) such 
that o is the identity permutation on X ; that is, o fixes each of the roots Zi- As o 
also fixes k, by definition of the Galois group, Proposition 5.20 gives ker<p = {1}. 
Therefore, <p is injective; that is, Gal(£/k) is isomorphic to a subgroup of S m . 
If m = n, the proof is done. If m < n, that is, if f(x) has repeated roots, note 
that S m is isomorphic to a subgroup of S„. For example, S m is isomorphic to 
the subgroup of all permutations in S n that fix each of m + 1, . . . , n. Thus, the 
theorem is true even if / (x) has repeated roots. • 

We are now going to compare different splitting fields of a polynomial over 
a given field k. The definition of a splitting held E of / (x) e k[x \ was given 
in terms of some held extension K/k over which / (x) is a product of linear 
factors. But what if K is not given at the outset? For example, suppose that 
k = C(x) and f(y) = y 2 — x, or that k = F 3 and f(x) = x 9 — x € F 3 [x]. 
Now Kronecker’s theorem, Theorem 3.118, gives a held extension of C(x)/C 
containing Jx, and it gives a held extension K /F 3 containing all the roots of 
fix). Neither of these held extensions is unique; for example, several splitting 
helds of x 9 — x over F 3 are given in Example 3.125. Nevertheless, we are going 
to show that, to isomorphism, splitting helds do not depend on the choice of 
extension held K. 
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The next result constructs automorphisms in Gal(£/Ai), and it also counts 
the number of them when k has characteristic 0. 

Recall Exercise 3.44 on page 248: if R and S are commutative rings and 
<p : R — > S is a homomorphism, then <p* : R[x] — »■ S[x ], defined by 

( p * : f (x) = r Q + r\x + r 2 x 2 H 

i-> <p(r 0 ) + (p{r i)x + (p(r 2 )x 2 H = 

is a homomorphism; if (p is an isomorphism, so is cp*. 

Proposition 5.22. Let fix ) e k\x |, and let E be a splitting field of fix) 
over k. Let (p: k — > k' be an isomorphism of fields, let <p* : k[x) — > k'[x ] be the 
isomorphism g(x) i-^ g*(x) given by Exercise 3.44 on page 248, and let E' be 
a splitting field of f*(x) over k! . 

(i) There exists an isomorphism 4> : E — »■ E 1 extending ( p. 



(ii) Ifk has characteristic 0, there are exactly [£; A:] isomorphism 4> : E — >■ E’ 
extending q). 

Proof 

(i) The proof is by induction on [E : k ]. If [£ : k] = 1, then f(x) is a product of 
linear polynomials in k[x], and it follows easily that f*(x) is also a product of 
linear polynomials in k' [x ] . Thus, we may set d> = <p. 

For the inductive step, choose a root z of fix) in E that is not in k, and let p (x ) 
be the irreducible polynomial in k[x\ of which z is a root [Proposition 3.1 16(i)]. 
Since z f k, deg (p) > 1; moreover, \k(z.) : A:] = dcg( p). by Example 5.14. Let 
p*(x) be the corresponding polynomial in k'[x], and let z! be a root of p*(x) in 
E'. Note that p*(x ) is irreducible, because the isomorphism k[x] — >■ k’[x] takes 
irreducible polynomials to irreducible polynomials. 

By Exercise 3.97 on page 304, there is an isomorphism <p : k(z) — > k'(z ') 
extending <p with <p(z) = z! . We now regard fix) as a polynomial in k(z) (for 
k C k(z) implies k[x] C k(z)[x\). We claim that £ is is a splitting field of fix) 
over kiz)', that is, 

E = kiz)izu . . • , z n ), 

where zt , . . . , z n are the roots of fix). Clearly, 

E = kizi, • • • , Z n ) Q kiz)izi, Z n ). 
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For the reverse inclusion, since z e E, we have 

k{z){z\ ,Zn)Q k(zu • • • . in) = E. 

But [E : A:(z)] < [E : k\, by Theorem 4.31, so that the inductive hypothesis 
gives an isomorphism : E — > E' that extends <p, and hence <p. 

(ii) The proof in this part is again by induction on [E : k]. If [E : k\ = 1, then 
E = k and there is only one extension, namely, = <p. If [ E : k\ > 1, let 
f{x) = p(x)g(x ) in k[x], where p(x) is an irreducible factor of largest degree, 
say, d. We may assume that d > 1, otherwise f (x ) splits over k and [E : k] = 1. 
Choose a root z e E of p{x) [this is possible because E/k is a splitting field of 
f{x) = p (x )g (x ) ] . As in part (i), the polynomial p * (x ) e k'[x] is irreducible, 
and there is some root z! of p*(x) in E' . Since k has characteristic 0, Exer- 
cise 3.91 on page 304 shows that p{x) and p*{x) have no repeated roots; that is, 
each has d distinct roots. By Proposition 3. 1 16(iii), there exist d isomorphisms 
<p: k{z) —*■ k'{z') extending <p, one for each of the roots z'\ there are no other 
isomorphisms extending (p, for any such extension must send z into some z! , in 
which case Proposition 5.20 shows it is already one of the <p. As in part (i), E 
is a splitting field of f(x) over k(z), and E' can be viewed as a splitting field of 
f*(x) over k'{z'). But [E : k] = [E : fc(z)][A:(z) : k\ = [E : k(z)]d, so that 
[£ : k(z) \ < [£ : k\. By induction, each p has exactly [E : k(z)] extensions 
4>: E — >• E' . Thus, we have exhibited [E : k{z)][k(z) : k\ = [E : k\ such 
extensions O. If r : E — > E' is another extension of (p. then t(z) = z! for some 
root z! of p*(x), and so r is an extension of that cp with <p{z ) = z! ■ But all such 
extensions E — > E' have already been counted. • 

Theorem 5.23. Ifk is afield and f(x) e k\x |, then any two splitting fields of 
fix) over k are isomorphic. 

Proof Let E and E' be splitting fields of /(x) over k. If ip is the identity, then 
Proposition 5.22(i) applies at once. • 

Corollary 5.24. The Galois group Gal {E/k) of a polynomial f(x) € k[x ] with 
splitting field E depends only on fix) and k, but not upon the choice of E. 

Proof If <p : E —> E' is an isomorphism fixing k, then there is an isomorphism 
Gal {E/k) — »• Gal {E'/k) given by a ipcr(p~ l . • 

It is remarkable that the next theorem was not proved until the 1890s, 60 
years after Galois discovered finite fields. 

Corollary 5.25 (E. H. Moore). Any two finite fields having exactly p n ele- 
ments are isomorphic. 
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Proof. If £ is a field with q = p" elements, then Lagrange’s theorem applied 
to the multiplicative group £ x shows that a q ~ l = 1 for every a e E x . It 
follows that every element of E, including a = 0, is a root of /(x) = x q — x = 
x(x q ~ l — 1) e F p [x], and so £ is a splitting field of /(x) over F p . • 

It follows that if g(x), h (x ) e F p [x] are irreducible polynomials of degree n, 
then F p [x]/(g(x)) = F p [x\/(h (x)), for both are fields with exactly p" elements. 

E. H. Moore (1862-1932) began his mathematical career as an algebraist, but 
he did important work in many other parts of mathematics as well; for example, 
Moore-Smith convergence is named in part after him. 

We can now compute the order of the Galois group Gal(£/k) when k has 
characteristic 0. 

Theorem 5 . 26 . If E / k is the splitting field of some polynomial in k\x\ where 
k is afield of characteristic 0, then | Gal(£ /k)\ = [£ : k\. 

Proof This is the special case of Proposition 5.22(ii) when k = k' , E = £', 
and (p = lr-. • 

Remark. Theorem 5.26 may not be true if k is a field of characteristic p > 0. It 
is true if k is a finite field, but it is false for k = F p (x), all rational functions over 
Fp. The key to investigating this question involves the notion of separability. A 
counterexample to Theorem 5.26 when char(k) = p is described in Exercise 5.31 
on page 469. ◄ 

Corollary 5 . 27 . Let ffx) £ k[x] be an irreducible polynomial of degree n, 
where k is afield of characteristic 0. If E/k is a splitting field of /(x) over k, 
then n is a divisor of \ Gal(£ /k)\. 

Proof. If z £ £ is a root of /(x), then [k(z) ; A;] = n, as in Example 5.14. But 
[£:&] = [£: k(z)][k(z) '■ k], so that n \ [E : k]. Since k has characteristic 0, 
Theorem 5.26 gives | Gal(£/£)| = [£ : k]. • 

If k is a field, then the factorization into irreducibles of a polynomial in k\x \ 
can change as one enlarges the ground field k. 

Lemma 5 . 28 . Let B /k be a splitting field of some polynomial g (x ) e k\x\ If 
p(x) e Ar[x] is irreducible, and if 

p(x) = q\(x) ■ ■■ q t {x ) 

is the factorization of p(x) into irreducibles in £[x], then all the qi(x) have the 
same degree. 
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Proof. Regard p(x ) as a polynomial in B\x ] (for Kg implies k[x ] C B[x]), 
and let E = B{z\ , . . . , z n ) be a splitting field of p(x), where z\, ■ ■ ■ , z n are the 
roots of /?(*)• If p (x ) does not factor in B[x], we are done. Otherwise, choose 
Z\ to be a root of q\ (x) and, for each j / 1, choose zj to be a root of qj(x). 
Since both z\ and Zj are roots of the irreducible p(x). Proposition 3.1 16(iii) 
gives an isomorphism <p j : k(z. \ ) — »• k(zj ) with <pj (z \ ) = Zj which fixes A: point- 
wise. Now Proposition 5.22(i) says that <pj extends to an automorphism <t> j of 
£, and Corollary 5.19 gives 0/(5) = B. Hence, <t> j induces an isomorphism 
O* : B[x | — »■ B[x | (by letting O, act on the coefficients of a polynomial). It 
follows that 

P * GO = ?*(*) • ••<?,* CO, 

where p*(x) = $*(») and «f(x ) = O* («,) for all i. Note that all the q/(x) are 
irreducible, because isoniorphisms take irreducible polynomials into irreducible 
polynomials. Now p*(x) = p(x), because O , fixes k pointwise, and so unique 
factorization in B [x] gives q*(x) = qi(x) for some i . But zj = 0 ; (z.\) is a root 
of q\{x), so that q*(x) = qj{x). Therefore, dcgU/i ) = deg(^r*) = deg(^ ; ), and 
all the q , have the same degree. • 

This lemma allows us to characterize those field extensions which are split- 
ting fields. 


Theorem 5.29. Let E /kbe a finite field extension. Then E /k is a splitting field 
of some polynomial in k[x ] if and only if every irreducible polynomial in k[x] 
having a root in E must split in E[x\. 

Proof. Suppose that E /k is a splitting field of some polynomial in k[x \. Let 
p(x) e k[x | be irreducible, and let p(x) = q \ (x ) ■ ■ -qt(x) be its factorization 
into irreducibles in E[x\. If p(x) has a root in E. then it has a linear factor in 
E[x\, by Lemma 5.28, all the q-, (x ) are linear, and so p(x) splits in E[x]. 

Conversely, assume that every irreducible polynomial in k[x] having a root 
in E must split in E[x]. Choose fii e E with £ k. Since E/k is finite, 
Proposition 3. 1 16(i) gives an irreducible polynomial p\(x) e k[x] having ft\ as 
a root. By hypothesis, p\(x) splits in E[x\. let B\ c £ be a splitting field of 
p\{x). If B i = E, we are done. Otherwise, choose [h e E with ft 2 £ B\ . As 
above, there is an irreducible pz(x) e k[x \ having ft 2 as a root. Define B 2 ft. E 
to be the splitting field of pi(x)p 2 (x), so that k c B\ c Bi c E. Since E/k is 
finite, this process eventually ends with E = B, for some r > 1. • 


Definition. A field extension E/k is a normal extension if every irreducible 
polynomial p(x) e k[x\ having a root in E splits in E[x\. 
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Here is the basic strategy for showing that there are polynomials of degree 5 
for which there is no formula, analogous to the classical formulas, giving their 
roots. First, we will translate the classical formulas (giving the roots of /(x ) e 
k[x]) in terms of subfields of a splitting field E over k. Second, this translation 
into the language of fields will itself be translated into the language of groups: If 
there is a formula for the roots of /(x), then Gal (E/k) must be a solvable group 
(which we will soon define). Finally, polynomials of degree at least 5 can have 
Galois groups that are not solvable. 

Formulas and Solvability by Radicals 

Without further ado, here is the translation of the existence of a formula for the 
roots of a polynomial in terms of subfields of a splitting field. 

Definition. A pure extension of type m is an extension k(u)/k, where u m e k 
for some m > 1 . An extension K/k is a radical extension if there is a tower of 
fields 


k = K 0 c Ki c ... c K, = K (2) 

in which each Kj+i/K, is a pure extension of type /«,. One calls Eq. (2) a radical 
tower. 

It is easy to see that any field extension K/k with [K : k] < 2 is a pure 
extension. By Theorem 4.53, a complex number z is constructible if and only if 
it is polyquadratic', that is, there is a tower of fields Q(i ) = Fo C Fi c • • • C F n 
with z e F„ and with [F, : F{- j] < 2 for all i. Exercise 5.16 on page 468 asks 
you to prove that Q(i, z)/Q is a radical extension. 

When we say that there is a formula for the roots of a polynomial /(x) 
analogous to the quadratic, cubic, and quartic formulas, we mean that there is 
some expression giving the roots of /(x) in terms of the coefficients of /(x). 
As in the classical formulas, the expression may involve the field operations, 
constants, and extraction of roots, but it should not involve any other operations 
involving cosines, definite integrals, or limits, for example. We maintain that a 
formula as informally described above exists precisely when /(x) is solvable by 
radicals in the following sense. 

Definition. Let /(x) e k[x] have a splitting field E. We say that /(x) is 
solvable by radicals if there is a radical extension 

k = K 0 Q Ki Q ■■■ Q K t 


with E c K t . 



Insolvability of the General Quintic 459 


Example 5.30. 

For every field ^ and every m > 1 , we show that the polynomial fix) = x m — I e 
k[x ] is solvable by radicals. Recall that the set r„, of all /nth roots of unity in a 
splitting held E/k of fix) is a cyclic group; a generator ( is called a primitive 
mot of unity. Note that |F m | = m unless k has characteristic p > 0 and p \ m, 
in which case |T m | = m', where m = p e m' and p \ m' . Now E = kit;), so 
that £ is a pure extension of k, and hence E/k is a radical extension. Therefore, 
fix) = x m — 1 is solvable by radicals. ◄ 

Let us illustrate this definition by considering the classical formulas for the 
polynomials of small degree. 


Quadratics 

Let f{x) = x 2 +bx +c e Q[v]. Define K\ = Q(m), where u = s/b 2 — 4c. 
Then K\ is a radical extension of Q, for ur e Q. Moreover, the quadratic formula 
implies that K\ is the splitting held of fix), and so fix) is solvable by radicals. 

Cubics 

Let fiX) = X 3 + bX 2 + cX + d e Q[x]. The change of variable X = 
x — ^b yields a new polynomial fix) = x 3 + qx + r e Q[.r] having the same 
splitting held E [for if u is a root of fix), then u — \b is a root of fix)]. 
Dehne K\ = Q (V D), where D = r 2 + 4jq 3 , and dehne Ki = K\ia), where 
a 3 = ^(— r + \fD). The cubic formula shows that Ki contains the root a + ft 
of fix), where /J = —q/ 3a. Finally, dehne AT 3 = Kiioj), where of = 1. The 
other roots of fix) are coa + or ft and of a + cofi, both of which lie in K 3, and 
so E c £ 3 . 

An interesting aspect of the cubic formula is the so-called casus irreducibilis', 
the formula for the roots of an irreducible cubic in Q[x] having all roots real (as 
in Example 5.9) requires the presence of complex numbers (see Rotman, Galois 
Theory, 2d ed.). 

Casus Irreducibilis. If fix) = x^+qx+r e Q[.r] is an irreducible polynomial 
having real roots, then any radical extension K t /Q containing the splitting held 
of fix) is not real; that is, K t C BL 

It follows that one cannot modify the dehnition of fix) being solvable by 
radicals so that a splitting held E is equal to the last term K t in a tower of pure 
extensions (instead of E c K t ). 


Quartics 

Let fix) = X 4 + bX 3 + cX 2 + dX + e e Q[v]. The change of variable 
X = x — ^b yields a new polynomial fix) = x 4 + qx 2 + rx + s e Q[.v]; 
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moreover, the splitting field E of fix) is equal to the splitting field of fix), for 
if u is a root of /(x), then u — \b is a root of fix). Recall 

fix) = x 4 + qx 2 + rx + s = (x 2 + jx + i)(x~ — jx + m ), 

and Eq. (12) shows that j 2 is a root of a cubic, 

U 2 ) 3 +2q(j 2 ) 2 + (q 2 -4s)j 2 -r 2 . 

Define pure extensions 


Q = K q ^K 1 ^K 2 ^ K 3) 

as in the cubic case, so that / 2 e K 2 . Define K 4 = K$ij), and note that 
Eqs. (11) on page 441 give t,m e K 4 . Finally, define K$ = /Qiy' / 2 — 41) 
and K(, = K$(yJ j 2 — 4m). The quartic formula gives E C K(, (this tower can 
be shortened). 

We have seen that quadratics, cubics, and quartics are solvable by radicals. 
Conversely, if fix) e Q[x] is a polynomial that is solvable by radicals, then 
there is a formula of the desired kind that expresses its roots in terms of its 
coefficients. For suppose that 


Q = K 0 Q K X c • • • c K t 


is a radical extension with splitting field E c K r . Let z be a root of fix). Now 
K t = K t -\(u), where u is an mth root of some element a e K,_ \ ; hence, z can 
be expressed in terms of u and K t _ 1 ; that is, z can be expressed in terms of !ifa 
and K t -\. But K t ~\ = K t - 2 iv), where some power of v lies in K t - 2 . Hence, 
z can be expressed in terms of u, v, and K t ~ 2 . Ultimately, z is expressed by a 
formula analogous to those of the classical formulas. 

Translation into Group Theory 

The second stage of the strategy involves investigating the effect of fix) being 
solvable by radicals on its Galois group. 

Suppose that kiu)/k is a pure extension of type 6; that is, m 6 e k. Now 
kiu 3 )/k is a pure extension of type 2, for (m 3 ) 2 = u 6 £ k, and k(u)/kiu 3 ) is 
obviously a pure extension of type 3. Thus, kiu)/k can be replaced by a tower 
of pure extensions k C kin 3 ) C kin) of types 2 and 3. More generally, one may 
assume, given a tower of pure extensions, that each field is of prime type over its 
predecessor: if k c kin) is of type m, then factor m = /;,... p q , where the p’s 
are (not necessarily distinct) primes, and replace k C k(u) by 

k C kiu m/pi ) C k(u m/p 1P2 ) C • • • C kin). 
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The next theorem, a key result allowing us to translate solvability by radicals 
into the language of Galois groups, shows why normal extensions are so called. 
The reader should recognize that extensions of fields seem to be playing the same 
role as do subgroups of groups. 

Theorem 5.31. Let k C K C E be a tower of fields, where both K/k and E/k 
are normal extensions. Then Gal(£ / K) is a normal subgroup o/Gal(£ / k), and 

Gal(E/k)/Gal(E/K) = Gal (K/k). 

Proof. Since K/k is a normal extension, it is a splitting field of some poly- 
nomial in k\x\ by Theorem 5.29. Hence, if a e Gal (E/k), then cr(K) = K, 
by Corollary 5.19. Define p: Gal (E/k) — »■ Gal {K/k) by a fa- cr| K. It is 
easy to see, as in the proof of Theorem 5.21, that p is a homomorphism and 
that kerp = Gal {E/K). It follows that Gal (E/K) is a normal subgroup of 
Gal {E/k). Now p is surjective: if r e Gal (K/k), then Proposition 5.22(i) ap- 
plies to show that there is a e Gal (E/k) extending t; that is, p(cr) = o\K = r. 
The first isomorphism theorem completes the proof. • 

The next result will be needed when we apply Theorem 5.31. 

Lemma 5.32. Let B be a finite extension of a field k. 

(i) There is a finite extension F / B with F /k a normal extension. 

(ii) If B is a radical extension ofk, then there is a tower of fields k c B c F 
with F / k both a normal extension and a radical extension. Moreover, the 
set of types of the pure extensions occurring in a radical tower of F /k is 
the same as the set of types in the radical tower of B /k. 

Proof. 

(i) Since B is a finite extension, B = k{z\, . . . , zt) for elements zi, ■ ■ ■ , zt- 
For each i. Theorem 3.116 gives an irreducible polynomial p,(x) e k[x | with 
Pi(Zi) = 0. Define f(x) = p\{x) ■ ■ ■ pi(x ) e k[x] c B[x\ and define F to be a 
splitting field of f(x) over B. Since f(x) e k[.v ]. we have F/k a splitting field 
of over k as well, and so F/k is a normal extension. 

(ii) Now 

F = k(.z \ > Z \ , Zj > • • •', Z2> Z2> ^2 >•••!••• j Zt, , Zg , • • •)) 

where Zi, zf z '(, ... are the roots of p, (x). We claim that 

F = k({cr(zi), ..., cr(zi) : cr € Gal {F/k)}). 

Clearly, the right hand side is contained in F, and so it suffices to prove the re- 
verse inclusion. In fact, it suffices to prove that zj = a (zi) [where z[ now denotes 
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any root of pi(x) for some /]. By Proposition 3.1 16(iii), there is an isomorphism 
y : k{zi) -*■ k(z'i ) fixing k and taking n z\, and, by Proposition 5.22(i), each 
such y extends to an isomorphism a e Gal {F/k). Therefore, z\ = <r(z/), as 
desired. 

Since B is a radical extension of k, there are u \, . . . , u t e B and a radical 
tower, 


k c k{u\) c k(u i, u 2 ) • • • c k(m, . . . , u t ) = B, (3) 

with each k(m, . . . , n,+i) a pure extension of k(u\, . . . , w,). We now show that 
F is a radical extension of k. Let Gal (F/k) = {1 = a \,o 2 , . . . ,o n }. Define 

B 1 = k(m, <T 2 (mi), rr 3 (w 1 ) , . . . , M«l))- 

There is a radical tower 


k C k(ni) C k(ni, < 72 ( 1 / 1 )) C ^(m, ^(m 1 ), a 3 («i)) c ■ • ■ c B\ 


displaying B\ as a radical extension of k. In more detail, if lies in k, then 
c tj(u j*) = <Jj (u 1 ) p e cry(k) = ^ c k(ni, 1 x 2 (m i ), . . . , cr ; _i(ni)). Note that these 
pure extensions all have the same type, namely, p, which is a type in the original 
radical tower (3). Define 


B 2 = k(u2, cr 2 (u2), cr 3 (w 2 ), . . . , Mm2)); 
there is a radical tower: 

B\ C B\{u 2 ) C Bi(M2, cr 2 (u 2 )) C B\{u 2 , (t 2 (u 2 ), cr 3 (n 2 )) C • • • C g 2 . 

Now B 2 is a radical extension of /f 1 : if e k(u 1 ) C // 1 , then oj (u^ ) = 
aj(u 2 ) q £ crj(Bi) B\ c B\(u 2 , a 2 {u 2 ), . . . , cr y _i(n 2 )). Again, these pure 
extensions have the same type, namely, q, which is a type in the radical tower (3). 
Since Bi is a radical extension of k, the radical tower from k to B\ followed by 
the radical tower from B\ to B 2 displays B 2 as a radical extension of k. For each 
i > 2, define B, + \ to be B, with u, , n 2 ( u / ) , ct 3 (m,), . . . adjoined. The argument 
above shows that B, + \ is a radical extension of k. Finally, since F = B t , we 
have shown that F is a radical extension of k, and that the statement about the 
types of its pure extensions is correct. • 


Lemma 5.33. Let k(u)/k be a pure extension of prime type p distinct from 
the characteristic of k. If k contains the pth roots of unity and if u 4 k, then 
Gal(k(u)/k) =I p . 
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Proof. Denote G'd\(k(u) / k) by G. Let a = u p e k. If to is a primitive pth root 
of unity, then the roots 1 , a>, . . . , co p ~ l are distinct [because p f char(/c) |, and 
the roots of f{x) = x p — a are u, cou, co 2 u , . . . , co p ~ l u\ since a> e k, it follows 
that k{u) is the splitting field of f(x) over k. If a e G, then cr(u) = co l u for 
some i, by Theorem 5. 1 8(i). Define <p : G — > \ p by cp{o ) = [i ], the congruence 
class of i mod p. To see that cp is a homomorphism, suppose that r e G and 
<p( r) = [_/]. Then ctt(m) = cr(oo J u) = co l+J u, so that cp(a x) = [i + j] = 
[/] + [/] = cp(o ) + cp( x). Now ker cp = {1}, for if cp{o) = [0], then o(u) = u\ 
since a fixes k, by the definition of G = Ga\(k(u)/k), Proposition 5.20 gives 
o = 1. Finally, we show that (p is a surjection. Since u £ k, the automorphism 
taking u cou is not the identity, so that im <p f {[0]}. But I p , having prime 
order p, has no subgroups aside from {[0]} and I p , so that im cp = I p . Therefore, 
cp is an isomorphism. • 

Here is the heart of the translation we have been seeking. 

Theorem 5.34. Let k = Kq c K \ c c • • • c K, be a radical extension of 
afield k. Assume, for each i, that each Kj is a pure extension of prime type p, 
over Ki-\, where pj char(k), and that k contains all the pfh roots of unity. 

(i) If K i is a splitting field overk, then there is a sequence of subgroups 

Gal (K,/k) = G 0 > Gi > G 2 > • • • > G, = {1}, 

with each G ,+ 1 a normal subgroup of G, and with Gi/G,-+\ cyclic of 
prime order. 

(ii) If fix') is solvable by radicals, then its Galois group Gal {E/k) is a quo- 
tient of a solvable group. 

Proof. 

(i) Defining Gi = Gal (K t /Kj) gives a sequence of subgroups of Gal (K t /k). 
Since K\ = k(u), where u r> 1 e k. the assumption that k contains a primitive p th 
root of unity shows that A'i is a splitting field of x in — u p 1 (see Example 5.15). 
We may thus apply Theorem 5.31 to see that Gi = Gal {K t /K\) is a normal 
subgroup of Go = Gal (K t /k) and that G 0 /Gi = Gal(Ki/k) = Ga\{K\/Ko). 
By Lemma 5.33, Go/Gi = I p{ . This argument can be repeated for each i. 

(ii) There is a radical tower, 

k = K 0 c Ki c K 2 c • • • c K t , 

with each Kj/Kj+i a pure extension of prime type, and with E c. K t . By 
Lemma 5.32, this radical tower can be lengthened; there is a radical tower 

k = Kq c Ki c • • • c K t c • • • c F, 
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where F/k is a normal extension. Moreover, the (prime) types of the pure ex- 
tensions in this longer radical extension are the same as those occurring in the 
original radical tower. Therefore, k contains all those roots of unity required in 
the hypothesis of part (i) to show that Gal (F/k) is a solvable group. 

Since £ is a splitting held, if a e Gal(£/&), then cr|£ e Gal(£/&), and so 
p: a o\E is a homomorphism Gal(£/£) — »■ Gal (E/k). Finally, Proposi- 
tion 5.22(i) shows that p is surjective; since £ is a splitting held over k, every 
a e Gal (E/k) extends to some a e Gal(£/£). • 

We shall see that not every group satishes the conclusion of Theorem 5.34(i); 
those groups that do enjoy that property have a name. 

Definition. A normal series of a group G is a sequence of subgroups 
G = Go > Gi > G 2 > • • • > G; = {1} 

with each G,+i a normal subgroup of G, ; the factor groups of this series are the 
quotient groups 

Go/ Gi, G 1 /G 2 , . . - , Gf-i/G/. 

A hnite group G is called solvable if G = {1} or if G has a normal series each 
of whose factor groups has prime order. 

In this language, Theorem 5.34 says that Gal (K t /k) is a solvable group if 
K t is a radical extension of k and k contains appropriate roots of unity. 

Example 5.35. 

(i) S 4 is a solvable group. 

Consider the chain of subgroups 

S 4 > A 4 > V > W > {1}, 

where Y is the four-group and W is any subgroup of V of order 2. This 
is a normal series: hrst, it begins with ,S’ 4 and ends with {1}; second, each 
term is a normal subgroup of its predecessor: A 4 <| .S' 4 ; V <1 A 4 (in fact, 
V<d S 4 , a stronger statement) ; W < V because Y is abelian. Now | S 4 /A 4 1 = 
|S 4 |/|A 4 | = 24/12 = 2, |A 4 /Y| = |A 4 |/|V| = 12/4 = 3, \\/W\ = 
|Y|/| W\ = 4/2 = 2, and | W /{ 1 } | = | W | =2. Thus, each factor group has 
prime order, and so S 4 is solvable, 

(ii) Every finite abelian group G is solvable. 

We prove this by induction on |G[; the base step |G| = 1 is trivially 
true. For the inductive step, recall Proposition 2.122: if G is a hnite abelian 
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group, then G has a subgroup of order d for every divisor d of |G|. Since 
|G| > 1, there is a factorization |G| = pd for some prime p, and so there 
is a subgroup H of G of order d. Now H < G, because G is abelian, and 
\G/H\ = \G\/\H\ = pd/d = p. By induction, there is a normal series 
from H to {1} with factor groups of prime orders, from which it follows 
that G is a solvable group. 

(iii) 55 is not a solvable group (in fact, S n is not solvable for all n > 5). 

In Exercise 2.123 on page 205, we saw, for all n > 5, that A„ is the 
only proper nontrivial normal subgroup of S n (the key fact in the proof is 
that A„ is a simple group). It follows that S n has only one normal series, 
namely, 

S n A n > {1} 

(this is not quite true; another normal series is S n > A n > A n > {1}, 
which repeats a term; of course, this repetition only contributes the new 
factor group A„/A n = {1}). But the factor groups of this normal series are 
S n /A n = I 2 and A n /{ 1} = A „ , and the latter group is not of prime order. 
Therefore, S n is not a solvable group for n > 5. ◄ 

Theorem 5.36. Every quotient G/N of a solvable group G is itself a solvable 
group. 

Remark. One can also prove that every subgroup of a solvable group is itself 
a solvable group. ◄ 

Proof By the first isomorphism theorem for groups, quotient groups are iso- 
morphic to homomorphic images, and so it suffices to prove that if / : G — > H 
is a surjection (for some group H), then H is a solvable group. 

Let G = Go > Gi > G2 > • • • > G r = {1} be a sequence of subgroups as 
in the definition of solvable group. Then 

H = /(Go) > /(GO > /(G 2 ) > > f(G t ) = {1} 

is a sequence of subgroups of H. If /(x,-+i) e /(G, + i) and «,■ e f(Gi), then 
m = f(xi ) and uif(xi + i)uf l = /(x,-)/(x,-+i)/(x,-) _1 = /(x,-x,-+ ix/ 1 ) e 
f(Gi), because G,+i <1 G,-; that is, /(G,+ 1) is a normal subgroup of /(G ; ). 
The map (p: Gi /(Gj)//(Gj+ 1), defined by x/ /(x,-)/(G;+ 1), is a sur- 

jection, for it is the composite of the surjections G, — »• /(G,) and the natural 
map f(Gj) — f /(G,-)//(G;+ 1). Since G, + 1 < ker <p, this map induces a sur- 
jection Gj/Gi+i f(Gi)/f(G i+ 1), namely, XfGj+1 /(x ; )/(G,+ 1). Now 
G,-/G;+i is cyclic of prime order, so that its quotient /(G, )//'(G,+i) is a cyclic 
group of order 1 or order a prime. Thus, deleting any repetitions if necessary, 
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H = f(G) has a series in which all the quotient groups are cyclic of prime order; 
therefore, H is a solvable group. • 

Here is the main criterion. 

Theorem 5.37 (Galois). Let k be a field and let f(x) e k[x\. If f(x) is 
solvable by radicals, then its Galois group Gal(£ / k) is a solvable group ifk has 
“enough” roots of unity. 

Remark. By “enough” roots of unity, we mean those arising from types of 
a radical tower displaying f(x) being solvable by radicals. Exercise 5.27 on 
page 468 shows how to eliminate this hypothesis. ◄ 

Proof. By Lemma 5.34(h), Gal(£/k) is a quotient of a solvable group and, by 
Theorem 5.36, any quotient of a solvable group is itself solvable. • 

If k has characteristic 0, then the converse of Theorem 5.37 is true; it was also 
proved by Galois (see my book Advanced Modern Algebra, p. 235). However, 
the converse is false in characteristic p. If f(x) = x p — x — t e k [x \ . where 
k = ¥ p ( t ), then the Galois group of f(x) over k is cyclic of order p, but f(x) is 
not solvable by radicals (see Proposition 4.56 in Advanced Modern Algebra). 

In 1827, Abel proved a theorem saying, in group-theoretic language not 
known to him, that if the Galois group of a polynomial / (x) is commutative, 
then f(x) is solvable by radicals. This is why abelian groups are so called. 
Since every finite abelian group is solvable [Example 5.35(h)], Abel’s theorem 
is a special case of Galois’ theorem. 

It is not difficult to prove that every subgroup of 54 is a solvable group. 
Since S 2 and 53 are subgroups of 54 , Theorem 5.21 shows that the Galois group 
of every quadratic, cubic, and quartic polynomial is a solvable group. Thus, 
the converse of Galois’ theorem shows that if k has characteristic 0, then every 
polynomial f(x) e k[x] with deg (/) < 4 is solvable by radicals (of course, we 
already know this because we have proved the classical formulas). 

We now complete the discussion by showing, for n > 5, that the general 
polynomial of degree n is not solvable by radicals. 

Theorem 5.38 (Abel-Ruffini). For all n > 5, the general polynomial of 
degree n, 

f(x) = (x - yi)(x - >’ 2 ) •••(*- y„), 
is not solvable by radicals. 

Proof. In Example 5.17, we saw that if F is a field, E = F(y 1 , . . . , y n ) is the 
field of all rational functions in n variables yi, . . . , y n with coefficients in F. 
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and k = F(ao, . . . , a n ), where the a, are the coefficients of f(x), then E is the 
splitting field of f(x ) over k. In particular, if we choose F = C, then k is an 
extension held of C, and hence k contains all the roots of unity. 

We claim that S n is (isomorphic to) a subgroup of Gal (E/k). Recall Exer- 
cise 3.50(h) on page 249: If A and R are domains and <p: A -a R is an iso- 
morphism, then [a, b\ ha- [(p(a), <p(b)] is an isomorphism Frac(A) -a- Frac(R). 
If a e S n , then there is an isomorphism a of F[_y i , . . . , y n ] with itself dehned 
by f(y i, . . . , y n ) fa f(y a \, . . . , y<jn)\ that is, a just permutes the variables of 
a polynomial in several variables. By Exercise 3.50 on page 249, a extends to 
an automorphism a* of E, for E = Frac(F[yi, . . . , y,,]). Eqs. (1) on page 448 
show that a* fixes k, and so cr* e Gal (E/k). Using Proposition 5.20, it is easy to 
see that a fa ct* is an isomorphism of S n with a subgroup of Gal(£ / k)\ in fact, 
Theorem 5.21 shows that Gal (E/k) = S„ . Therefore, if n > 5, then Gal {E/k) 
is not a solvable group, and Theorem 5.37 shows that f(x) is not solvable by 
radicals. • 

We have proved that there is no generalization of the classical formulas to 
polynomials of degree n > 5. 

Example 5.39. 

Here is an explicit example of a quintic polynomial which is not solvable by 
radicals. If f(x) = x 5 — 4x ± 2 e Q[x], then f(x) is irreducible over Q, 
by Eisenstein’s criterion (Theorem 3.103). If E/Q is the splitting field of f(x) 
contained in C and G = Gal( E /Q). then Corollary 5.27 gives \G\ = \E : Q] 
divisible by 5. 

We now use some calculus. There are exactly two real roots of the derivative 
f'(x ) = 5v 4 —4, namely, ±4/4/5 ~ ±.946, and so f(x ) has two critical points. 
Now / ( 4/4/5) < 0 and /(- ^4/5) > 0, so that f(x) has one relative maximum 
and one relative minimum. It follows easily that f(x) has exactly three real 
roots (although we will not need to know their values, they are, approximately, 
— 1.5185, 0.5085, and 1.2435; the complex roots are —.1168 ± 1.4385/.) 

The Galois group G is isomorphic to a subgroup of Sx = S 5 , where X is the 
set of 5 roots of fix). Now G has an element a of order 5, by Cauchy’s theorem 
(Theorem 2.145), which must be a 5-cycle, for these are the only elements of 
order 5 in S 5 . The restriction of complex conjugation to E, call it r, is a transpo- 
sition, for r interchanges the two complex roots while it fixes the three real roots. 
But Ss is generated by any transposition and any 5-cycle, by Exercise 2.1 15 on 
page 205, so that G = Gal(£/Q) = S 5 . Therefore, Gal(£/Q) is not a solvable 
group, by Example 5.35(iii), and so Theorem 5.37 says that f{x) is not solvable 
by radicals. ◄ 

An (impractical) algorithm computing Galois groups is given in van der 
Waerden, Modern Algebra, vol. I, pp. 189-192. However, more advanced ex- 
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positions of Galois theory show how to compute explicitly Galois groups of 
fix ) e Q[x] when deg (/) < 4. 


Exercises 

*5.15 Let ip: A — »• // be a group homomorphism. If IS < A and B < ker <p, prove 
that the induced map ip* : A/ B — > //, given by a B i-> <p(a), is a well-defi ned 
homomorphism with irtup* = mup. 

*5.16 If z G C is a constructible number, prove that Q (i, z)/Q is a radical extension. 

5.17 Let k be a fi eld and let fix) G k[x\. Prove that if E and E are splitting fi elds of 
fix) over k, then Gal(E/k) = Gal (E' /k). 

5.18 Prove that F 3 [x]/(x 3 — x 2 — 1) = F 3 [x]/(x 3 — x 2 + x — 1). 

5.19 Is F 4 a subfi eld of IJ ? 

5.20 Let k be a fi eld of characteristic p > 0, and defi ne the Frobenius map F : k — »• k 
by F : a 1 — a p . 

(i) Prove that F : k —*■ k is an injection. 

(ii) When k is fi nite, prove that F is an automorphism fi xing the prime fi eld 
F p . Conclude that F G Gal(fc/F /; ). 

(iii) Prove that if k is fi nite, then every a G k has a pth root; that is, there is 
b G k with b p = a. 

5.21 Let q = p" for some prime p and some 11 > 1. 

(i) If a is a generator of F^ , prove that F ? = F p (of). 

(ii) Prove that the irreducible polynomial p(x ) G F /; [x] of a has degree n. 

(iii) Prove that if G = Gal(F ? /F p ), then |G] < n. 

(iv) Prove that Gal(F ? /F p ) is cyclic of order n with generator the Frobe- 
nius F. 

5.22 Prove that the following statements are equivalent for a quadratic fix) = ax 2 + 
bx + c G Q[x]. 

(i) fix) is irreducible. 

(ii) ~Jb 2 — 4ac is not rational. 

(iii) Gal(Q (\/ b 2 — Aac), Q) has order 2. 

*5.23 Let E/k be a splitting field of a polynomial fix) G k[x\. If deg(/) = n, prove 
that [E : k\ < n\. Conclude that E/k is a fi nite extension. 

5.24 What is the degree of the splitting fi eld of x 30 — 1 over F 5 ? 

5.25 Prove that if fix) G Q[x] has a rational root a, then its Galois group is the same 
as the Galois group of /(x)/(x — a). 

*5.26 (i) Let H be a normal subgroup of a fi nite group G. If both // and G/ H are 

solvable groups, prove that G is a solvable group. 

(ii) If El and K are solvable groups, prove that El x K is solvable. 

*5.27 We are going to improve Theorem 5.37 by eliminating the hypothesis involving 
roots of unity: if k is a fi eld and fix) G k\x | is solvable by radicals, then its Galois 
group Ga l(E/k) is a solvable group. 
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Since /(x) is solvable by radicals, there is a radical tower k = K q c • • • C F 
with E C F; moreover, we were able to assume that F/k a splitting field of 
some polynomial. Finally, if k contains a certain set £2 of /nth roots of unity, then 
Gal (E/k) is solvable. 

(i) Defi ne Ft / E to be a splitting fi eld of — 1, and deli ne It = A (£2). 
Prove that E* is a splitting field of /(x ) over It, and conclude that 
Gal (E*/k*) is solvable. 

(ii) Prove that Ga\(E*/k*)<Gal(E*/k), and that Gal(£*/A)/ Gal(E*/A*) = 
Gal (k*/k). 

(iii) Use Exercise 5.26 to prove that GaX(E* /k) is solvable. 

(iv) Prove that Ga\(E* /E) < G<A(E*/k), and that Ga\(E* / k) / Ga\(E* / E) = 
Gal (E/k), and conclude that Gal(E/k) is solvable. 

*5.28 Let f(x) G Q|x | be an irreducible cubic with Galois group G. 

(i) Prove that if /(x) has exactly one real root, then G = S 3 . 

(ii) Find the Galois group of f(x) = x 3 — 2 G Q[x]. 

(iii) Find a cubic polynomial g(x) G Q[x] whose Galois group has order 3. 

*5.29 (i) If k is a field and f(x) G AIxl has derivative fix), prove that either 

f'(x) = 0 or deg(/') < deg(/). 

(ii) If k is a fi eld of characteristic 0, prove that an irreducible polynomial 
p(x) G k[x] has no repeated roots; that is, if E is the splitting fi eld of 
p(x), then there is no a G E with ( x — ci ) 2 | p(x) in £[x], 

*5.30 Let A: be a fi eld of characteristic p. 

(i) Prove that if f (x) = J2i a i x ‘ G A[x], then f'(x) = 0 if and only if the 
only nonzero coeffi cients are those q with p \ i . 

(ii) If A is fi nite and /(x) = a,x' G A[x], prove that f'(x) = 0 if and 
only if there is g(x) G A[x] with /(x) = g(x) p . 

(iii) Prove that if A is a fi nite fi eld. then every irreducible polynomial p(x) G 
A[x] has no repeated roots. 

*5.31 (i) If A = F ;7 (f), the fi eld of rational functions over , prove that x p — t G 

A[r] has repeated roots. (It can be shown that x p — t is an irreducible 
polynomial.) 

(ii) Prove that E = k(a ) is a splitting fi eld of x p — t over A. 

(iii) Prove that Gal(Zi / A) = { 1 } . 


5.3 Epilog 

Further investigation of these ideas is the subject of Galois theory, which studies 
the relationship between extension fields and their Galois groups. Aside from its 
intrinsic beauty, Galois theory is used extensively in algebraic number theory. 
The following technical notion turns out to be important. 
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Definition. A polynomial f(x) £ k[x ] is separable if its irreducible factors 
have no repeated roots. 

We have seen that k is separable if it has characteristic 0 [Exercise 5.29(h) 
on page 469] or if it is finite [Exercise 5.30(iii)]. On the other hand, there are 
inseparable polynomials, as we have seen in Exercise 5.31. The following gen- 
eralization of Theorem 5.26 shows why separable polynomials are interesting 
(there is a proof in my book Advanced Modern Algebra, Theorem 4.7). 

Theorem. Let k be a field and let fix) £ k\x ] be a separable polynomial. If 
E/k is a splitting field of fix), then | Gal (E/k)\ = [ E : k\. 

Definition. Let E/k be a field extension with Galois group G = Gal (E/k). If 
H < G, then the fixed field E H is defined by 

E h = {u £ E : cr{u) = u for all a £ H). 

The following theorems can be proved (for example, see Section 4.2 of my 
book Advanced Modern Algebra). Theorem 5.29, which characterizes splitting 
fields, can be modified in the presence of separability. 

Theorem. Let E/k be a field extension with Galois group G = Gal (E/k). 
Then the following statements are equivalent. 

(i) E is a splitting field of some separable polynomial f(x) £ k\x\ 

(ii) Every irreducible p(x) £ k[x | having one root in E is separable and it 
splits in E\x \. 

(iii) k = E g \ that is, if a £ E and cr(a) = a for all a £ G, then a £ k. 

Definition. A finite field extension E/k is a Galois extension if it satisfies any 
of the equivalent conditions in this theorem. 

The following theorem shows that there is an intimate connection between 
the intermediate fields B in a Galois extension E/k (that is, subfields B with 
k c B c E) and the subgroups of the Galois group. 

Theorem (Fundamental Theorem of Galois Theory). Let E/k be a finite 
Galois extension with Galois group G = Gal (E/k). 

(i) The function H i— ► E H is a bijection, from the set of all subgroups of 
Gal (E/k) to the set of all intermediate fields, which reverses inclusions: 

H < L if and only if E L C E H . 
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For every intermediate field B and every H < G, 

£ Gal(E/ B) = B and Gal( E / E H ) = H 

(ii) For every intermediate field B and every subgroup H of G, 

[B : k] = [G : Gal(£/B)] and [G : H] = [E H : k], 

(iii) An intermediate field B is a Galois extension ofk if and only if G&\{E / B) 
is a normal subgroup of G. 

Here are some consequences. 

Theorem (Theorem of the Primitive Element). If E/h is a finite separable 
extension, then there is primitive element a e E; that is, E = k(a). 

In particular, every finite extension of Q has a primitive element. This fol- 
lows from a theorem of E. Steinitz which says, given a finite extension E/k, that 
there exists a e E with E = k(a ) if and only if there are only finitely many 
intermediate fields k c B c E. 

Theorem. The finite field W q , where, q = p", has exactly one subfield of order 
p d for every divisor d of n, and no others. 

This follows from Gal(F^ /F p ) being cyclic of order n. 

Theorem. If E / k is a Galois extension whose Galois group is abelian, then 
every in termediate field is a Galois extension. 

This follows because every subgroup of an abelian group is normal. 

There are many proofs of the Fundamental Theorem of Algebra, and there is 
one using Galois theory. 

Theorem (Fundamental Theorem of Algebra). If f(x) e C[a ] is not a 

constant, then f{x) has a root in C. 

We now use the Fundamental Theorem of Galois Theory to complete the 
discussion of constructibility in Chapter 4. 

Recall that a prime p is a Fermat prime if p has the form p = 2'" + 1 (in 
which case in = 2' ; see the proof of Corollary 3.104). We end with a proof 
of Gauss’ theorem that if p is a Fermat prime, then a regular p- gon can be 
constructed with straightedge and compass. 



472 


Fields Ch. 5 


Lemma 5.40. Let E /kbe a Galois extension with Galois group G = Gal (E /k). 
Given subgroups G > H > L, then 

[E l : E h ] = [H : L], 

Proof. Since H E H is order- reversing, there is a tower of fields 

k = E g c E h c E l c E 


(we have k = E c because E/k is a Galois extension). Theorem 4.31 gives 
[E l : k] = [E l : E H ][E H : k], and so the Fundamental Theorem of Galois 
Theory gives 


[E l : E h ] = 


[E l : k ] 
[E h : k] 


[G:L] 
[G : H] 


\G\/\L\ 

\G\/\H\ 


\H\ 

— = [H: L\. 
\L\ 




Theorem 5.41 (Gauss). Let p be an odd prime. A regular p-gon is con- 
structible if and only if p = 2 m + 1 for some m > 0. 

Proof. Necessity was proved in Theorem 4.59, where it was shown that m must 
be a power of 2 when m > 0. 

If p is a prime, then x p — I = (x — I ) Op fit), where O p (x) is the p th 
cyclotomic polynomial. A primitive /?th root of unity f is a root of <f> /( (x ) and, 
since <P p (x) is an irreducible polynomial of degree p — 1 (Corollary 3.104), we 
have [Q(f)/Q] = p — 1 = 2 m . By Theorem 5.26, we have | Gal(Q(f)/Q)| = 
2 m . As any 2-group, Gal(Q(f )/Q) has a normal series 

Gal(Q(?)/Q) = Go > G { > ■ ■ • > G t = {1} 

with every factor group of order 2; that is, [G;_ i : G, ] = 2 for all i > 1. By the 
Fundamental Theorem of Galois Theory, there is a tower of subfields 

Q = K 0 QKiQ---QK, =Q(C). 

Moreover, Lemma 5.40 gives [K, : K{-{] = [G,_i : G,] = 2 for all i > 1. This 
says that is poly quadratic, and hence C, is constructible, by Theorem 4.53. • 
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6.1 Finite Abelian Groups 

We continue our study of groups by considering finite abelian groups; as is cus- 
tomary, these groups are written additively. We are going to prove that every 
finite abelian group is a direct sum of cyclic groups, and so we begin by consid- 
ering direct sums. 

Definition. The external direct sum of two abelian groups S and T is the 
abelian group S x T whose underlying set is the cartesian product of S and T 
and whose operation is given by (s, t) + (s', t ') = (s + s', t + t'). 

It is routine to check that the external direct sum is an (abelian) group. For 
example, the plane M 2 is a group under vector addition, and I 2 = I x 1 

Definition. If S and T are subgroups of an abelian group G, then G is the 
internal direct sum, denoted by G = S © 7', if each element geG has a unique 
expression of the form g = s + 1, where s e S and t e T . 

If S and T are subgroups of an abelian group G, define 

S + T = {s + t : s e S and t e T}. 

Now S + T is always a subgroup of G, for it is (5 U T), the subgroup generated 
by S and T (see Exercise 6.2 on page 485). Saying that G = S + T means that 
each g e G has an expression of the form g = s + t, where s e S and t e T : 
saying that G = S © T means that such expressions are unique. 

Here is the additive version of Proposition 2. 125. We need not say that S and 
T are normal subgroups, for every subgroup of an abelian group is normal. 
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Lemma 6.1. If S and T are subgroups of an abelian group G, then G = S ®T 
if and only if 5 + T = G and 5 D T = {0}. 

Proof. Assume that G = 5 © T . Every g e G has a unique expression of the 
form g = s + t, where s e 5 and t e T\ hence, G = 5 + T . If x e 5 Pi T, 
then x has two expressions as s + t, namely, x = x + 0 and x = 0 + x. Since 
expressions are unique, we must have x = 0, and so 5 fl T = {0}. 

Conversely, G = 5 + T implies that each g e G has an expression of the 
form g = s + t, where s e 5 and t e T. To see that this expression is unique, 
suppose also that g = s' + t', where s' e 5 and t' e T . Then s + t = s' + t' 
gives s — s' — t' — t £= SOT = {0}. Therefore, s = s' and t = t', as desired. • 

Definition. A subgroup 5 of an abelian group G is called a direct summand 
if there exists a subgroup T of G with G = 5 © T\ that is, S + T = G and 

snr = {0}. 

Note that S x T cannot equal S © T, for neither S nor T is a subgroup of 
S x T\ indeed, they are not even subsets of the cartesian product. This is easily 
remedied. Given abelian groups S and T, define subgroups S* and T* of the 
external direct sum S x T by 

S* = {(s,0) : ^ G 5} and T* = {(0, t) : t e T}. 

Of course, 5 = 5* via s (s, 0) and T = T* via t \-> (0, t). It is easy to see 
that 5 x T = 5* © T*, for 5* + T* = 5 x T, because (s, t ) = (s, 0) + (0, t), and 
5* n T* = {(0, 0)}. Thus, the external direct sum can be viewed as an internal 
direct sum (of subgroups isomorphic to 5 and to T). The next result shows, 
conversely, that an internal direct sum is isomorphic to an external one. 

Proposition 6.2. Let 5 and T be subgroups of an abelian group G with G = 
5 + T. If G = 5 © T ( that is, S Cl T = {0}), then there is an isomorphism 
f 5ffif-> 5x T with cp(S ) = 5* and <p{T) = T*. 

Proof. If g e 5 © T , then Lemma 6.1 says that g has a unique expression 
of the form g = s + t. Define ip\ 5 © T — »• 5 x f by <p(g) = (p(s + t) = 
(s, t). Uniqueness of the expression g = s + t implies that cp is a well-defined 
function. It is obvious that cp(S) = S* and <p(T) = T*. Let us check that (p is a 
homomorphism. If g' = (s', t'), then (s, t ) + (s', t') = (s + s', t + 1'); hence, 

<P(g + g ') = <p(s + s' + 1 + f) 

= (s + s' , t + t') 

= (s, t) + (s', f) 

= <p(g) + <p(g')- 
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If (p(g) = (s, t) = (0, 0), then s = 0, t = 0, and g = s + t = 0; hence, cp is 
injective. Finally, <p is surjective, for if (s, t) e S x 7 . then (pis + 1) = (s, t). • 

We now extend this discussion to more than two summands. 

Definition. The external direct sum of abelian groups S\, S 2 , ■ • ■ , S n is the 
abelian group Si x S 2 x ■ ■ ■ x whose underlying set is the cartesian product 
of Si, S 2 , . . . , S n , and whose operation is given by 

(*1,52, ...,*«) + Op 4’ • • • , s' n ) = Ol +s[, S 2 + s' 2 , . . . ,S n +«')• 

For example, euclidean n -space M" is the external direct sum of JR with itself 
n times: UR" = JR x • • • x M. . 

Definition. If Si, . . . , S n are subgroups of an abelian group G, then G is the 
internal direct sum, denoted by 

G = S !©•••© S„, 

if, for each g e G, there are unique sp e Si with g = sq + • • • + s n . 

Example 6.3. 

Let k be a field and let G = k" be the external direct sum of k with itself n times. 

As usual, let cp . . . , e n be the standard basis; that is, a = (0 0, 1, 0, . . . , 0), 

the n -tuple having /th coordinate 1 and all other coordinates 0. If Vj is the one- 
dimensional subspace spanned by e, , that is, V, = [atp : a e k}, then k" is the 
internal direct sum k n = V\ © • • • © V n , for every vector has a unique expression 
as a linear combination of a basis. M 

We now show that every external direct sum can be viewed as an internal 
direct sum. If Si, . . . , S n are abelian groups, define, for each i, 

S* = {(0, . . . , 0, Si, 0, . . . , 0) : Si e S/} c Si x • • • x S,p, 

that is, S* consists of all those n-tuples in the cartesian product whose only non- 
zero coordinates occur in the /th position. Of course, 5,- and S* are isomorphic, 
for all i, via sp (0, . . . , 0, Sj, 0, . . . , 0). Let us check that G is the internal 
direct sum 

G = SI ® • • • © S*. 

If g = (s 1 , . . . , s n ) € S 1 x • • • x S n , then 

8 = Oi, 0 , 0) + (0, s 2 , 0 0) H F (0, . . . , 0, s n ). 

Such an expression is unique, for if (si, . . . , s n ) = (t\, . . . , t „ ) , then the defini- 
tion of equality of n -tuples gives s, = ti for all i. 



476 


Groups II Ch. 6 


Given subgroups Si, S 2 , . . . , S n which generate an abelian group G, how 
do we generalize Lemma 6.1 to several summands? One’s first guess is that 
assuming Sj 0 Sj = {0} for all i j should imply that G = Si © • • • © S n , but 
we now show that this is not adequate. 

Let V be a 2-dimensional vector space over a field k. and let x, y be a basis; 
hence, V = (x) © (y). It is easy to check that the intersection of any two of 
the subspaces (x), (y), and (x + y) is {0}. On the other hand, we do not have 
V = {x) © (y) © (x + y) because 0 has two expressions in (x) + {y) + (x + _y), 
namely, 0 = 0 + 0 + 0 and 0 = — x — y + (x + y). 

We are now going to show that every internal direct sum is isomorphic to an 
external one. Here is the generalization of Lemma 6.1. 

Proposition 6.4. Let G = Si + S 2 + • ■ ■ + S n , where the Si are subgroups', that 
is, each g e G has an expression of the form 

g = jj + s 2 H f s n , 

where si € Sj for all i. Then the following conditions are equivalent. 

(i) G = Si © S 2 © • • • © S„; that is, for every element g £ G, the expression 
g = si + • ■ ■ + s n , where st e S; for all i, is unique. 

(ii) There is an isomorphism tp\ G — > Si x S 2 x • • • x S n with (piS,) = Sf for 
all i. 

(iii) If we define G , = Si H + S; + f S n , where Sj means that the term 

Sj is omitted from the sum, then S, O Gj = {0} for each i. 

Proof. 

(i) (ii) If g e G and g = si + • • • + s n , then define tp : G Si x • • • x S n 
by tp(g) = <p (s 1 +•••+«„) = (si, . . . , s n ). Uniqueness of the expression 
for g shows that <p is well-defined. It is straightforward to prove that < p is an 
isomorphism with 1 p(Sj) = S* for all i. 

(ii) =>• (iii) If g € St n Gu then tp{g) e S* n (S* + • • • + Sf + • • • + S*). But 
if tp(g) e S* + • • • + S* + • • • + S*, then its ith coordinate is 0; if (pig) e Sf, 
then its / th coordinates are 0 for all j f i. Therefore, (pig) = 0. Since (p is an 
isomorphism, it follows that g = 0. 

(iii) =>• (i) Let g e G, and suppose that 

g = si + • • • + s n = h + • • • + t n , 

where sf, e S; for all i. For each i, we have Sj — ti = ^ (tj — Sj) e 
Sj n (Si + • • • + Si + ■ ■ ■ + S n ) = {0}. Therefore, ,v ; = t t for all i, and the 
expression g = Yf , ,■ s i is unique. • 
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Notation. From now on, we shall use the notation Si © • • • © S„ to denote 
either version of the direct sum, external or internal, because our point of view 
is almost always internal. We will also write 1 

n 

Si = Si © • • • © S n . 

1=1 

The notation G = $i abbreviates G = Si + • • • + S n = (Si U • • • U S n ). 
Thus, G = JT Sj if every g £ G has an expression of the form g = JT s,- for 
Si £ Sj, while G = 0 S; if G = JT S, and expressions g = JT ,sy are unique. 

It will be convenient to analyze groups “one prime at a time.” 

Definition. If p is a prime, then an abelian group G is p-primary 2 if, for each 
a £ G, there is n > 1 with p n a = 0. 

If G is any abelian group, then its p-primary component is 

G p = {a £ G : p n a = 0 for some n > 1}. 

If we do not want to specify the prime p, we may write that an abelian 
group is primary (instead of p-primary). It is clear that primary components are 
subgroups. This is not true for nonabelian groups. For example, if G = S3, 
then Gi = {(T), (1 2), (1 3), (2 3)}, which is not a subgroup of S3 because 
(1 2)(1 3) = (1; 3 2) £ G 2 . 

Theorem 6.5 (Primary Decomposition). 

(i) Every finite abelian group G is the direct sum of its p-primary components'. 

G = © G P- 
P 

(ii) Two finite abelian groups G and G' are isomorphic if and only if 
G p = G'p for every prime p. 

Proof 

1 In Advanced Modem Algebra, the sequel to the previous edition of this book, I denote the 
direct sum by Sj. I now think that it is clearer to denote the direct sum by 0 ( - Sj (which 
is one of several commonly used notations ) and to denote the sum, the subgroup generated by 
(J ( - Sj. by Y^i Sj. If I have a chance to redo the sequel, then I will adopt this notation. 

"In Chapter 2, we called a fi nite group G a p-group if each g e G has order some power 
of p. Thus, a p-primary abelian group is just an abelian p-group. If one is working wholly in 
the context of abelian groups, as we are now doing, then the term p- primary is used; if one is 
working with general groups, then the usage of the term p-group is preferred. 
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(i) Let x e G he nonzero, and let its order be cl. By the fundamental theo- 
rem of arithmetic, there are distinct primes p \, . . . , p n and positive exponents 
e \, . . . , e n with 

d = P? Pn" ' 

Define r,- = d / p e ‘ , so that p e -‘ r, = d . It follows that r,x e G Pi for each i . But the 
ged d of rj , . . . , r n is 1 (the only possible prime divisors of d are p\ , . . . , p n , but 
no pi is a common divisor because p t { r t ): hence, there are integers .v i . . . . , s„ 
with 1 = Sjrj. Therefore, 

x = ^2 strix e G pi H h G Pn . 


Write Hj = G Pl + • • • + G Pi + • • • + G Pn . By Proposition 6.4, it suffices to 
prove that if 


x e G Pi n Hi , 

then x = 0 . Since x e G Pi , we have pfx = 0 for some l > 0; since x e Hj, we 
have mx = 0, where u = n jhpj- But p- and u are relatively prime, so there 
exist integers .v and t with 1 = sp ■ + tu. Therefore, 

X = (sp- + tu)x = sp-x + tux = 0. 


(ii) If / : G — »• G' is a homomorphism, then f(G p ) C G' p for every prime p, 
for if p l: a = 0, then 0 = f(p 1 a ) = p f f(a). If / is an isomorphism, then 
f~ l : G' G is also an isomorphism (so that f~ l (G' p ) c G p for all p). It 
follows that each restriction f\G p : G p — >■ G' p is an isomorphism, with inverse 

r l \G' p . 

Conversely, if there are isomorphisms f p : G p -> G' p for all p, then there is 
an isomorphism cp: ® p G p ® p G’ p given by J2 P a P ^ J2 p f P (a p ). • 


Notation. If G is an abelian group and m is an integer, then 

mG = [ma : a e G). 

It is easy to see that mG is a subgroup of G. 

The next type of subgroup will play an important role. 


Definition. Let p be a prime and let G be a p -primary 3 abelian group. A 
subgroup S c G is a pure 4 subgroup if, for all n > 0, 

SDp n G = p"S. 

3 If G is not a primary group, then a pure subgroup S C G is defi ned to be a subgroup 
which satisfi es S fl mG = mS for all m e Z (see Exercises 6.3 and 6.4 on page 485). 

4 A polynomial equation is called pure if it has the form x n = a; pure subgroups are 
defi ned in terms of such equations, and they are probably so called because of this. 
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The inclusion S IT p n G 3 p n S is true for every subgroup Sc G, and so it 
is only the reverse inclusion S D p"G C p"S that is significant. It says that if 
s £ S satisfies an equation s = p"a for some a e G. then there exists s' £ S 
with s = p n s' - , that is, if an equation s = p n x is solvable for x £ G, then it is 
solvable for x £ S. 

Example 6.6. 


(i) Every direct summand S of G is a pure subgroup. Let G = S © T. and 
suppose that s = p"g, where s £ S and g £ G. Now g = u + v, where 
u £ S and v £ T, and so s = p"u + p”v. Hence, p n v = s — p n u £ 
S IT T = {0}, so that p n v = 0. Therefore, s = p n u, and S is pure in G. 

(ii) If G = (a) is a cyclic group of order p 2 , where p is a prime, then S = {pa) 
is not a pure subgroup of G, for if s = pa £ S, then there is no element 
s' £ S with s = pa = ps'. ◄ 

In Exercise 6.11 on page 486, we shall see that the converse of Exam- 
ple 6.6(i) is true: if G is a finite abelian group and 5 is a subgroup of G, then S is 
a pure subgroup if and only if S is a direct summand. This is the reason we have 
introduced pure subgroups, for it is easier to prove that 5 is a direct summand by 
verifying whether certain equations are solvable than to construct a subgroup T 
with S + T = G and S fl T = {0}. 


Lemma 6.7. If p is a prime and G {0} is a finite p-primary abelian group, 
then G has a nonzero pure cyclic subgroup. 

Proof Let G = (x \ , x q ). The order of x/ is p"‘ for all i, because G is p- 
primary. If x £ G, then x = a,x,-, where a,- e Z, so that if i is the largest of 

the m, then p f x = 0. Now choose any y £ G of largest order // (for example, 
y could be one of the yf). We claim that S = (y) is a pure subgroup of G. 
Suppose that s £ S, so that ,v = mp'y, where t > 0 and p \ m, and let 

^ = p n a 

for some a £ G. If t > n, define s' = mp'~ n y £ S, and note that 

Yl ! VI t — n t 

p s = p mp y = mp y = s. 


If t < n , then 


p l a = p l n p"a = p l n s = p e "mp'y = mp 1 n+t y. 


But p \ m and l — n + t < i, because —n + t < 0, and so p e a 0. This 

contradicts y having largest order, and so this case cannot occur. • 
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Proposition 6.8. If G is an abelian group and p is a prime, then G/pG is a 
vector space over W p which is finite-dimensional when G is finite. 

Proof If [r] e F f) and a € G, define scalar multiplication 

[r](a + pG) = ra + pG. 

This formula is well-defined, for if k = r mod p, then k = r + pm for some 
integer m, and so 


ka + pG = ra + pma + pG = ra + pG , 

because pma e pG. It is now routine to check that the axioms for a vector space 
do hold. If G is finite, then so is G/ pG, and it is clear that G/ pG has a finite 
basis. • 

Definition. If p is a prime and G is a finite /up rim ary abelian group, then 

d(G) = dim(G/pG). 

Observe that d is additive over direct sums, 

d(G® H) = d(G)+d(H), 

for Proposition 2.124 gives 

G®H _ G ® H G H 

p(G © H) pG © pH pG ® pH 

The dimension of the left side is d(G © H ) and the dimension of the right side 
is d (G) + d (H), for the union of bases of G/pG and of H / pH, respectively, is 
a basis of (G/pG) © {H/ pH). 

The abelian groups G with d(G) = 1 are easily characterized. 

Lemma 6.9. If G is a p -primary abelian group, then d(G) = 1 if and only if 
G is cyclic. 

Proof. If G is cyclic, then so is any quotient of G ; in particular, G/pG is cyclic, 
and so dim (G/pG) = 1. 

Conversely, if G/ pG = {z + pG), then G/pG = I p . Since Ip is a simple 
group, the correspondence theorem says that pG is a maximal subgroup of G. 
We claim that pG is the only maximal subgroup of G. If L c G is any maximal 
subgroup, then G/L = I p , for G/L is a simple abelian group of order a power 
of p, hence has order p (by Proposition 2.151: the abelian simple groups are 
precisely the cyclic groups of prime order). Thus, if a € G, then p(a + L) =0 
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in G/L, so that pa e L; hence pG C L. But pG is maximal, and so pG = L. 
It follows that every proper subgroup of G is contained in pG (for every proper 
subgroup is contained in some maximal subgroup). In particular, if (z) is a proper 
subgroup of G, then (z) C pG, contradicting z+pG being a generator of G/ pG. 
Therefore, G = (z), and so G is cyclic. • 

If G = (Ip)", then pG = {0}, G/pG = G, and d(G) = dim (G). More 
generally, if G is a direct sum of /^-primary cyclic groups, say, G = 0- C/, then 
pG = 0- pCi, and Proposition 2.124 gives 

g/ p g = (©C«)/(©pC/) = ®(C//pCi). 

i i i 

We have just seen that d(C, ) = 1 for all i, and so additivity of d over direct sums 
shows that d{G) counts the number of cyclic summands in this decomposition 
of G. 

Lemma 6.10. Let G be a finite p-primary abelian group. 

(i) IfS C G, then d(G/S ) < d(G). 

(ii) IfS is a pure subgroup of G, then 

d(G) = d(S) + d(G/S). 

Proof 

(i) By the correspondence theorem, p{G/S ) = ( pG + S)/S, so that 

0 G/S)/P(G/S ) = (G/S)/[(pG + S)/S] = G/(pG + 5), 

by the third isomorphism theorem. Since pG C pG + S, there is a surjective 
homomorphism (of vector spaces over F p ), 

G/pG -+ G/ipG + S), 

namely, g + pG i-^ g + ( pG + 5). Hence, dim(G//iG) > dim(G/(;rG + 5)); 
that is, d (G) > d (G/S). 

(ii) We now analyze (pG + S)/pG, the kernel of G/pG — »■ G/(pG + 5). By 
the second isomorphism theorem, 

(pG + S)/pG = S/(Sn P G). 

Since 5 is a pure subgroup, S D pG = pS. Therefore, 

(pG + S)/pG = S/pS , 

andsodim[(/iG+S)//iG] = d(S). But if W is a subspace of a finite-dimensional 
vector space V, then dim(V) = dimlW) + dim (V/W), by Exercise 4.16 on 
page 344. Hence, if V = G/pG and W = ( pG + S)/pG , we have 

d(G) = d(S) + d(G/S). • 
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Theorem 6.11 (Basis Theorem). Every finite abelian group G is a direct sum 
of primary cyclic groups. 

Proof. By the primary decomposition, Theorem 6.5, we may assume that G is 
/^-primary for some prime p (for if every primary component is a direct sum of 
cyclic groups, so is G). We prove that G is a direct sum of cyclic groups by 
induction on d{G) > 1 . The base step is easy, for Lemma 6.9 shows that G must 
be cyclic in this case. 

To prove the inductive step, we begin by using Lemma 6.7 to find a nonzero 
pure cyclic subgroup S C G. By Lemma 6.10, we have 

d(G/S ) = d(G) - d(S) = d(G) - 1 < d(G). 

By induction, G/S is a direct sum of cyclic groups, say. 


9 

G/S = 0 (xj ) , 

i=i 


where x, = x; + S. 

Let x e G and let x have order jf , where x = x + S. We claim that there is 
zeG with z + S =x = x + S such that order z = order (x). Now x has order 
p n , where n > i. But p l (x + S) = p e x = 0 in G/S, so there is some s e S with 
p i: x = s. By purity, there is s' e S with p l Xj = p l s’ . If we define z = x — s' , 
then z + S = x + S and p 1 z = 0. Hence, if mx = 0 in G/S, then p ' | m , and so 
mz = 0 in G. 

For each i, choose Zi e G with Zi + S = x, = x, + S and with order 
Zi = order xp, let T = (zi, . . . , z q ). Now S + T = G, because G is generated 
by S and the z/’s. ToseethatG = S®T, it now suffices to prove that SDT = {0}. 
If y e 5nr, then y = JT nijZi, where m\ e Z. Now y e S, and so m;x, = 0 
in G/S. Since this is a direct sum, each m{xi = 0; after all, for each i, 

-mfxi = y ; mfxj e (xi) n ^ (xi) H h (x,-) 4 h (x q )^j = {0}. 

j¥=i 

Therefore, m t Zi = 0 for all i, and hence y = 0. 

Finally, G = S®T impliesd(G) = d(S)+d(T) = l 4 -d(r),sothatd(r) < 
d(G). By induction, T is a direct sum of cyclic groups, and this completes the 
proof. • 

When are two finite abelian groups G and G ’ isomorphic? By the basis 
theorem, such groups are direct sums of cyclic groups, and so one’s first guess is 
that G = G’ if they have the same number of cyclic summands of each type. But 
this hope is dashed by Theorem 2.126, which says that if m and n are relatively 
prime, then I mn = I m x I„; for example, Ig = I 2 x 1 3 . Thus, we retreat and 
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try to count primary cyclic summands. But how can we do this? As in the 
fundamental theorem of arithmetic, we must ask whether there is some kind of 
unique factorization theorem here. 

Before stating the next lemma, recall that we have defined 

d(G) = dim(G/pG). 

In particular, d(pG ) = dim (pG/p 2 G) and, more generally, 
d(p n G) = dim (p n G / p n+1 G) . 

Lemma 6.12. Let G be a finite p-primary abelian group, where p is a prime, 
and let G = Cj, where each Cj is cyclic. Ifb n is the number of summands 
C j having order p n , then there is some t > I with 

d(p"G ) = b, ,+i + 2 + •■■ + &(. 

Proof. Let B n be the direct sum of all Cj, if any, with order p" . Thus, 

G = Bi © B 2 © • • • © B, 

for some t. Now 

p n G = p n B n+ i ©•••© p n B„ 
because p n Bj = {0} for all j < n. Similarly, 

p n+l G = p n+l B n+ 2 © • • • © p n+l B t . 

Now Proposition 2.124 shows that p n G/p n+[ G is isomorphic to 

[p n B n+ i/p n+1 B n+ i] © [p n B n+2 /p n+l B n+2 ] © • • • © [p"B t /p” +1 B t ]. 
Since d is additive over direct sums, we have 

d (p n G) = b n +i + b n + 2 + • • • + bt . • 

The numbers b n can now be described in terms of G. 

Definition. If G is a finite p-primary abelian group, where p is a prime, then 
U p (n, G ) = d(p"G) - d(p n+1 G). 

Lemma 6.12 shows that 

d(p”G ) = b n+ 1 + • • • + bt 

and 

d(p" +l G) = b n + 2 + ••• + &{, 
so that U pin, G) = b n+ 1 . 
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Theorem 6.13. If p is a prime, then any two decompositions of a finite p- 
primary abelian group G into direct sums of cyclic groups have the same number 
of cyclic summands of each type. More precisely, for each n > 0, the n umber of 
cyclic summands having order p n+l is U p (n, G). 

Proof By the basis theorem, there exist cyclic subgroups C, with G = 0, C , . 
The lemma shows, for each n > 0, that the number of C, having order p n+l 
is Up (n , G), a number that is defined without any mention of the given decom- 
position of G into a direct sum of cyclics. Thus, if G = 0 • Dj is another 
decomposition of G, where each Dj is cyclic, then the number of Dj having 
order p n+i is also U p (n, G), as desired. • 


Corollary 6.14. If G and G' are finite p-primary abelian groups, then 
G = G' if and only ifU p (n, G) = U p (n, G') for all n > 0. 

Proof If <p: G G' is an isomorphism, then <p{p n G ) = p n G' for all n > 
0, and hence it induces isomorphisms of the F p -vector spaces p n G / p" +1 G = 
p n G'/p n+1 G' for all n > 0. Hence, their dimensions are the same; that is, 
U p (n, G) = U p {n, G'). 

Conversely, assume that U p In , G) = U p In, G') for all n > 0. If G = 0, C, 
and G' = 0- C'. , where the C, and Cl are cyclic, then Lemma 6.12 shows that 
there are the same number of summands of each type, and so it is a simple matter 
to construct an isomorphism G — »■ G'. • 


Definition. If G is a /up rim ary abelian group, then the elementary divisors of 
G are the numbers p" +1 , each repeated with multiplicity U p In , G). 

If G is a finite abelian group, then its elementary divisors are the elementary 
divisors of all its primary components. 

For example, the elementary divisors of the abelian group arc 

(2, 2, 2), and the elementary divisors of Ig are (2, 3). The elementary divisors of 
I 2 ®l2©l4©l8are (2, 2, 4, 8). 


Theorem 6.15 (Fundamental Theorem of Finite Abelian Groups). Finite 
abelian groups G and G' are isomorphic if and only if they have the same ele- 
mentary divisors', that is, any two decompositions of G and G' as direct sums of 
primary cyclic groups have the same number of summands of each order. 

Proof. By the primary decomposition, Theorem 6.5(ii), G = G' if and only if, 
for each prime p, their primary components are isomorphic: G p = G' p . The 
result now follows from Theorem 6.13. • 
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The results of this section can be generalized from finite abelian groups to 
finitely generated abelian groups, where an abelian group G is finitely generated 
if there are finitely many elements e G so that every x € G is a 

linear combination of them: x = JT m/a/, where m, e Z for all i. The basis 
theorem generalizes: every finitely generated abelian group G is a direct sum of 
cyclic groups, each of which is a finite primary group or an infinite cyclic group. 
One calls a direct sum of infinite cyclic groups a free abelian group. Thus, every 
finitely generated abelian group is a direct sum of a free abelian group and a finite 
group. The fundamental theorem also generalizes: given two decompositions of 
G into direct sums of infinite and primary cyclic groups, the number of cyclic 
summands of each type is the same in both decompositions. The basis theorem 
is no longer true for abelian groups that are not finitely generated; for example, 
the additive group Q of rational numbers is not a direct sum of cyclic groups. 


Exercises 


6.1 (i) Give an example of an abelian group G = S © T having a subgroup A 

such that A ^ (S fl A) ® (T fl A). 

(ii) Suppose that G is an abelian group and that G = S © 7\ If H is a 
subgroup with S C H C G, prove that H = S © (T PI H). 

*6.2 (i) If G is an (additive) abelian group and X is a nonempty subset of G, 

prove that ( X ), the subgroup generated by X. is the set of all linear com- 
binations of elements in X with coeffi cients in Z: 

(X) = { mjXj : Xi G X andm/ e Z}. 


Compare this exercise with Proposition 2.77 . 

(ii) If S and T are subgroups of G, prove that S + T = (S U T). 

*6.3 Let G be an abelian group, not necessarily primary. Defi ne a subgroup SCG to 
be a pure subgroup if S fl mG = mS for all m e Z. Prove that if G is a /(-primary 
abelian group, where p is a prime ideal, then a subgroup S C G is pure as just 
defi ned if and only if S fl fi G = p" S for all n > 0 (the defi nition in the text). 

*6.4 If G is a possibly infi nite abelian group, defi ne the torsioA subgroup l G of G as 


tG = {a e G : a has fi nite order}. 


(i) Prove that tG is a pure subgroup of G. (There exist abelian groups G 
whose torsion subgroup tG is not a direct summand; hence, a pure sub- 
group need not be a direct summand.) 

5 This terminology comes from algebraic topology. To each space X, one assigns a se- 
quence of abelian groups, called homology groups , and if X is ‘twisted.” then there are ele- 
ments of fi nite order in some of these groups. 
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(ii) Prove that G/tG is an abelian group in which every nonzero element has 
infi nite order. 

6.5 (i) If G and H are fi nite abelian groups, prove, for all primes p and all n > 0, 

that 

U p (n, G@H) = U p (n, G) + U p (n, H), 

(ii) If A, B , and C are fi nite abelian groups, prove that A © B = A 0 C 
implies B = C. 

(iii) If A and B are fi nite abelian groups, prove that A 0 A = B 0 B implies 
A = B. 

6.6 If n is a positive integer, then a partition of n is a sequence of positive integers 
i | < *2 < • • • < i r with i i + h + ■ ■ ■ + i r = n. If p is a prime, prove that the 
number of abelian groups of order p n , to isomorphism, is equal to the number of 
partitions of n . 

6.7 To isomorphism, how many abelian groups are there of order 288? 

6.8 Prove the Fundamental Theorem of Arithmetic by applying the Fundamental The- 
orem of Finite Abelian Groups to G = I„. 

*6.9 If G is a fi nite abelian group, defi ne 

v; t(G) = the number of elements in G of order k. 

Prove that two fi nite abelian groups G and G are isomorphic if and only if va-(G) = 
Vk{G') for all integers k. (This result is not true for nonabelian groups; see Propo- 
sition 6.29.) 

6.10 Prove that the additive group Q is not a direct sum: Q ££ A © B, where A and B 
are nonzero subgroups. 

*6.11 (i) Let S be a subgroup of a p -primary abelian group G, and let ir : G — > 

G/S be the natural map g i-> g + S. Prove that S is a pure subgroup 
of G if and only if each g + S G G/S has a pre-image g' G G (that is, 
j r(g') = g + S) with g' and g + S having the same order. 

(ii) Prove that a subgroup S of a fi nite p-primary abelian group G is pure if 
and only if it is a direct summand. 

6.12 Let G be a fi nite abelian group. Prove that ifx e G has maximal order (that is, if x 
has order n, then there is no element in G having larger order), then ( x ) is a direct 
summand of G . 

6.13 Let F and F 1 be free abelian groups. If F is a direct sum of m infi nite cyclic groups 
and F' is a direct sum of n infi nite cyclic groups, prove that F = F' if and only if 
m -- n. 

6.14 (i) If F = {x\) © • • • 0 ( x„ ) is a free abelian group, prove that every z e F 

has a unique expression of the form z = m\X\ + • • • + m n x„, where 
mi G Z for all i. One calls x\, ... ,x n a basis of F. 

(ii) Let X = x \ , . . . , x„ be a basis of a free abelian group F . Prove that if A 
is any abelian group and if a \, . . . , a n is any list of elements in A, then 
there exists a unique homomorphism / : F — >• A with f(xi ) = ai for 
all i . 
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6.2 The Sylow Theorems 

We return to nonabelian groups, and so we revert to the multiplicative notation. 
The Sylow theorems give an analog for finite nonabelian groups of the primary 
decomposition for finite abelian groups. 

Recall that a group G is simple if G 7 ^ {1} and G has no normal subgroups 
other than {1} and G itself. We saw, in Proposition 2.78, that the abelian simple 
groups are precisely the cyclic groups I p of prime order p, and we saw, in The- 
orem 2.83, that A„ is a nonabelian simple group for all n > 5. In fact, A 5 is the 
nonabelian simple group of smallest order. How can one prove that a nonabelian 
group G of order less than 60 = | A 5 1 is not simple? Exercise 2.105 states that 
if G is a group of order |G| = nip, where p is prime and 1 < m < p, then G 
is not simple. This exercise shows that many of the numbers less than 60 are not 
orders of simple groups. After throwing out all prime powers (by Exercise 2.106 
on page 204, groups of prime power order are never nonabelian simple), the only 
remaining possibilities are 

12, 18, 24, 30, 36, 40, 45, 48, 50, 54, 56. 

The solution to the exercise uses Cauchy’s theorem, which says that G has an 
element of order p, hence a subgroup of order p. We shall see that if G has a 
subgroup of order p e instead of p, where p e is the highest power of p dividing 
|G|, then Exercise 2.105 can be generalized, and the list of candidates can be 
shortened to 30, 40, and 56. 

The first book on Group Theory, Trades des Substitutions et des Equations 
Algebriques, by C. Jordan, was published in 1870 (more than half of it is devoted 
to Galois Theory, then called the Theory of Equations). At about the same time, 
but too late for publication in Jordan’s book, three fundamental theorems were 
discovered. In 1868, E. Schering proved the basis theorem: every finite abelian 
group is a direct product of primary cyclic groups; in 1870, L. Kronecker, un- 
aware of Schering’s proof, also proved this result. In 1878, G. Frobenius and L. 
Stickelberger proved the fundamental theorem of finite abelian groups. In 1872, 
L. Sylow showed, for every finite group G and every prime p, that if p e is the 
largest power of p dividing |G|, then G has a subgroup of order p e . 

Recall that a p-group is a finite group G in which every element has order 
some power of a prime p; equivalently, G has order p k for some k > 0. (When 
working wholly in the context of abelian groups, as in the last section, one calls 
G a /^-primary group.) 


Definition. Let p be a prime. A Sylow p-subgroup of a finite group G is a 
maximal p-subgroup P. 
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Maximality means that if Q is a /(-subgroup of G and P < Q, then P = Q. 
Sylow /(-subgroups always exist: indeeed, we now show that if S is any p- 
subgroup of G (perhaps S = {1}), then there exists a Sylow /(-subgroup P 
containing S. If there is no /(-subgroup strictly containing S, then S itself is 
a maximal /(-subgroup: that is, 5 is a Sylow /(-subgroup. Otherwise, there is a 
/(-subgroup P] with S < P\. If Pi is maximal, it is Sylow, and we are done. 
Otherwise, there is some /(-subgroup P 2 with Pi < Pp, hence, |Pi I < \Pi\- 
This procedure of producing larger and larger /(-subgroups P, must end after a 
finite number of steps, for |G| is finite; the largest P, must, therefore, be a Sylow 
/(-subgroup. 

Example 6.16. 

Let G be a finite group of order |G| = p e m, where p is a prime and p \ m. 
We show that if there exists a subgroup P of order p e , then P is a Sylow p- 
subgroup of G. If Q is a /(-subgroup with P < Q < G, then |P| = p e \ \Q\. 
But if | <2| = p k , then p k \ p e m and k < e\ that is, \ Q\ = p e and Q = P. -4 

Definition. If H is a subgroup of a group G, then a conjugate of H is a sub- 
group of G of the form 

a/fa -1 = {a/za -1 : h e H}, 


where a e G. 

Conjugate subgroups are isomorphic: if H < G, then h aha~ [ is an 
injective homomorphism H — >• G with image aHa~ l . The converse is false: 
the four-group V contains several subgroups of order 2 which are, of course, 
isomorphic; they cannot be conjugate because Y is abelian. On the other hand, 
all subgroups of order 2 in S 3 are conjugate; for example, ((1 3)) = a((l 2))a _1 , 
where a = (2 3). 

The ideas of group actions are going to be used, and so we now recall the 
notions of orbit and stabilizer which we discussed in Chapter 2. 

Definition. If X is a set and G is a group, then G acts on X if, for each g e G, 
there is a function otg : X — > X, such that 

(i) oig o a/, = a g h for all g,h e G; 

(ii) a 1 = lx, the identity function. 

Definition. If G acts on X and x e X, then the orbit of x, denoted by O(x), is 
the subset of X 


0(x) = {, u g (x ) : g G G} c X; 
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the stabilizer of x, denoted by G x , is the subgroup of G 
G x = {g £ G : a g (x) = x] < G. 

A group G acts on X = Sub(G), the set of all its subgroups, by conjugation: 
if g £ G, then g acts by a g (H ) = gHg~ l , where H < G. The orbit of 
a subgroup H consists of all its conjugates; the stabilizer of H is {g e G : 
gHg~ l = H). This last subgroup has a name. 

Definition. If H is a subgroup of a group G, then the normalizer of H in G is 
the subgroup 

N G (H) = {g£G:gHg~ l = H}. 

Of course, H <] N G (H), and so the quotient group N G (H)/H is defined. 

Proposition 6.17. If H is a subgroup of a finite group G, then the number of 
conjugates of H in G is [G : Nq (//)]. 

Proof This is a special case of Theorem 2.141: the size of the orbit of an 
element is the index of its stabilizer. • 

Lemma 6.18. Let P be a Sylow p-subgroup of a finite group G. 

(i) Every conjugate of P is also a Sylow p-subgroup of G. 

(ii ) | N G (P)/P I is prime to p. 

(iii) If g £ G has order some power of p and if gPg~ 1 = P, then g £ P. 
Proof 

(i) If g £ G, then gPg~ l is a p-subgroup of G; if it is not a maximal such, then 
there is a p-subgroup Q with gPg~ l < Q. Hence, P < g~ l Qg, contradicting 
the maximality of P. 

(ii) If p divides \N G {P)/ P\, then Cauchy’s theorem shows that Ng(P)/P con- 
tains an element gP of order p, and hence Ng(P)/P contains a (cyclic) sub- 
group S* = (gP) of order p. By the correspondence theorem (Theorem 2.121), 
there is a subgroup S with P < S < Nq{P) such that S/P = S*. But S is a 
p-subgroup of Nq{P ) < G (by Exercise 2.88 on page 187) strictly larger than 
P, and this contradicts the maximality of P. We conclude that p does not divide 
\N G (P)/P\. 

(iii) The element g lies in N G ( P ), by the definition of normalizer. If g £ P, then 
the coset gP is a nontrivial element of N G {P)/P having order some power of 
p; in light of part (ii), this contradicts Lagrange’s theorem. • 

Since every conjugate of a Sylow p-subgroup is also a Sylow p-subgroup, it 
is reasonable to let G act by conjugation on a set of Sylow p-subgroups. 
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Theorem 6.19 ( Sylow ). Let G be a finite group of order p e m, where p is a 
prime and p { m, and let P be a Sylow p-subgroup of G. 

(i) Every Sylow p-subgroup is conjugate to P. 

(ii) If there are r Sylow p-subgroups, then r is a divisor of\G\/p e and 

r = 1 mod p. 

Proof Let X = {Pi, , P, } he the set of all the conjugates of P , where we 
have denoted P by Pi . If Q is any Sylow p-subgroup of G, then Q acts on X by 
conjugation: if a e Q, then it sends 

cta(Pi) = &a (gi P gf 1 ) = a(giPgf l )a~ l = {agi)P{agi)~ { € X. 

By Corollary 2. 142, the number of elements in any orbit is a divisor of | Q \ ; that 
is, every orbit has size some power of p (because Q is a p-group). If there is 
an orbit of size 1, then there is some Pj with a P l a ~ 1 = P, for all a e Q. By 
Lemma 6.18, we have a e Pj for all a e Q; that is, Q < Pj. But Q, being a 
Sylow p-subgroup, is a maximal p-subgroup of G, and so Q = Pj. In particular, 
if Q = P[ , then every orbit has size an honest power of p except one, the orbit 
consisting of P\ alone. We conclude that |Z| = r = 1 mod p. 

Suppose now that there is some Sylow p-subgroup Q that is not a conjugate 
of P ; thus, Q f Pj for any i . Again, we let Q act on X, and again we ask if there 
is an orbit of size 1, say, {Pj}. As in the previous paragraph, this implies Q = Pj, 
contrary to our present assumption that Q f X. Hence, there are no orbits of size 
1, which says that each orbit has size an honest power of p. It follows that |X| = 
r is a multiple of p; that is, r = 0 mod p, which contradicts the congruence 
r = 1 mod p. Therefore, no such Q can exist, and so all Sylow p-subgroups 
are conjugate to P . Finally, since all Sylow p-subgroups are conjugate, we have 
r = [G : Nq(P)], and so r is a divisor of |G| = p e m. But (r, p) = 1, because 
r = 1 mod p, so that r \ p e m implies r \ m ; that is, r | |G|/p e . • 


Corollary 6.20. A finite group G has a unique Sylow p-subgroup P, for some 
prime p, if and only if P <\ G. 

Proof. Assume that P, a Sylow p-subgroup of G, is unique. For each a e G, 
the conjugate aPa~ l is also a Sylow p-subgroup; by uniqueness, aPa~ l = P 
for all a e G, and so P <\ G. 

Conversely, assume that P <\ G. If Q is any Sylow p-subgroup, then Q = 
aPa~ l for some a e G; but aPa" 1 = P, by normality, and so Q = P. • 

The following result gives the order of a Sylow subgroup. 
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Theorem 6.21 (Sylow). If G is a finite group of order p e m, where p is a 
prime and p j m, then every Sylow p-subgroup P of G has order p e . 

Proof We first show that p j [ G : P |. Now 

[G: P] = [G:N g (P)][N g (P): PI- 

The first factor, [G : Nq (P)\ = r, is the number of conjugates of P in G, 
and we know that r = 1 mod p\ hence, p does not divide [G : Nq ( P ) ]. The 
second factor is [ Ng(P ) : P] = \Ng(P) / P\\ this, too, is not divisible by p, by 
Lemma 6.18(h). Therefore, p does not divide [G : P |. by Euclid’s lemma. 

Now | P | = p k for some k < e, and so 

[G : P] = \G\/\P\ = p e m/p k = p e ~ k m. 

Since p does not divide [G : P], we must have k = e\ that is, \P\ = p e . • 

Example 6.22. 

(i) If G is a finite abelian group, then a Sylow /(-subgroup is just its /(-primary 
component. Since G is abelian, every subgroup is normal, and so G has a 
unique Sylow /(-subgroup for every prime p. 

(ii) Let G = 54 . Now I.S 4 I = 24 = 2 3 3. Thus, a Sylow 2-subgroup of 54 has 

order 8. We have seen, in Exercise 2.107 on page 204, that 54 contains a 
copy of the dihedral group Z>8 consisting of the symmetries of a square. 
The Sylow theorem says that all subgroups of order 8 are conjugate, hence 
isomorphic, to Z?8. Moreover, the number r of Sylow 2-subgroups is a 
divisor of 24/8 congruent to 1 mod 2; that is, r is an odd divisor of 3. 
Since r / 1 (see Exercise 6.15 on page 496), there are exactly 3 Sylow 
2-subgroups; 54 has exactly 3 subgroups of order 8. ◄ 

Here is a second proof of the last Sylow theorem, due to Wielandt. 

Theorem 6.23 (= Theorem 6.21). If G is a finite group of order p e m, where 
p is a prime and p \ m, then G has a subgroup of order p e . 

Proof If X is the family of all those subsets of G having exactly p e elements, 
then |X| = (” c ); by Exercise 1.66 on page 56, p \\X\. Now G acts on X: define 
a g (B ) = gB, for g e G and B e X, where gB = [gb : b e B }. If p divides 
\0(B)\ for every B e X, then p is a divisor of |X|, for X is the disjoint union of 
orbits, by Proposition 2.140. As p \ |X|, there exists a subset B with |fi| = p e 
and with \0(B)\ not divisible by p. If G « is the stabilizer of this subset B, then 
Theorem 2.141 gives [G : G B ] = \0(B)\, and so |G| = \G B \ ■ \0(B)\. Since 
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p e | |G| and p \ O(B) |, repeated application of Euclid’s lemma gives p e \ \Gb\- 
Therefore, p e < \Gb\- 

To prove the reverse inequality, choose an element b e B and define a func- 
tion r : Gb — »• B by g h>- gb. Note that r (g) = gb e gB = B, for g e Gb, the 
stabilizer of B. If g, h e Gb and h / g, then r(/z) = hb ^ gb = r(g); that is, 
r is an injection. We conclude that |Gg| < \B\ = p e , and so Gg is a subgroup 
of G of order p e . • 

If p is a prime not dividing the order of a finite group G, then a Sylow p- 
subgroup of G has order p° = 1. Thus, when speaking of Sylow ;>subgroups 
of G, one usually avoids this trivial case and assumes that p is a divisor of | G | . 

We can now generalize Exercise 2.122 on page 205 and its solution. 

Lemma 6.24. There is no nonabelian simple group G of order |G| = p e m, 
where p is prime and m > 1, p j m, and p e j (m — 1)!. 

Proof. Suppose that such a simple group G exists. By Sylow’s theorem, G 
contains a subgroup P of order p e , hence of index m. By Theorem 2.61, the rep- 
resentation of G on the cosets of P, there exists a homomorphism cp: G — > S m 
with ker cp < P . Since G is simple, however, it has no proper normal subgroups; 
hence ker^ = (1} and (p is an injection; that is, G = (p(G) < S m . By Lagrange’s 
theorem, p e m \ ml, and so p e \ (m — 1 )!, contrary to the hypothesis. • 

Proposition 6.25. There are no nonabelian simple groups of order less than 60. 

Proof. If p is a prime, then Exercise 2. 106 on page 204 says that every p-group 
G with | G | > p is not simple. 

The reader may check that the only integers n between 2 and 59, neither a 
prime power nor having a factorization of the form n = p e m as in the statement 
of the lemma, are n = 30, 40, and 56. By the lemma, these three numbers are 
the only candidates for orders of nonabelian simple groups of order < 60. 

Assume that there is a simple group G of order 30. Let P be a Sylow 5- 
subgroup of G, so that |P| = 5. The number rs of conjugates of P is a divisor 
of 30/5=6 and r$ = 1 mod 5. Now r$ 1, lest P <\ G, so that rs = 6 . By 
Lagrange’s theorem, the intersection of any two of these is trivial (intersections 
of Sylow subgroups can be more complicated; see Exercise 6.16 on page 497). 
There are 4 nonidentity elements in each of these subgroups, and so there are 
6 x4 = 24 nonidentity elements in their union. Similarly, the number n of Sylow 
3-subgroups of G is 10 (for r 3 7 ^ 1, r 3 is a divisor of 30/3, and r^ = 1 mod 3). 
There are 2 nonidentity elements in each of these subgroups, and so the union of 
these subgroups has 20 nonidentity elements. We have exceeded the number of 
elements in G, and so G cannot be simple. 
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Let G be a group of order 40, and let P be a Sylow 5-subgroup of G. If r is 
the number of conjugates of P, then r | 40/5 and r = 1 mod 5. These conditions 
force r = 1, so that P <\G. Therefore, no simple group of order 40 can exist. 

Finally, assume that there is a simple group G of order 56. If P is a Sylow 
7-subgroup of G, then P must have r = 8 conjugates (for r | 56/7 and r = 
1 mod 7). Since these groups are cyclic of prime order, the intersection of any 
pair of them is { 1 } , and so there are 48 nonidentity elements in their union. Thus, 
adding the identity, we have accounted for 49 elements of G. Now a Sylow 
2-subgroup Q has order 8 , and so it contributes 7 more nonidentity elements, 
giving 56 elements. But there is a second Sylow 2-subgroup, lest Q < G, and we 
have exceeded our quota. Therefore, there is no simple group of order 56. • 

The “converse” of Lagrange’s theorem is false: if G is a finite group of order 
n, and if d \ n, then G may not have a subgroup of order d. For example, we 
proved, in Proposition 2.97 , that the alternating group A 4 is a group of order 12 
having no subgroup of order 6 . 


Proposition 6.26. Let G be a finite group. If p is a prime and if p k divides \ G\, 
then G has a subgroup of order p k . 

Proof If |G| = p e m, where p \ m, then a Sylow p-subgroup P of G has order 
p e . Hence, if p k divides |G|, then p k divides |P|. By Proposition 2.150, P has 
a subgroup of order p k ; a fortiori, G has a subgroup of order p k . • 


What examples of p -groups have we seen? Of course, cyclic groups of order 
p n are p-groups, as is any direct product of copies of these. By the basis theorem, 
this describes all finite abelian p -groups. The only nonabelian examples we have 
seen so far are the dihedral groups Din (which are 2 -groups when n is a power 
of 2), the quaternions Q of order 8 (of course, for every 2-group A, the direct 
products Dg x A and Q x A are also nonabelian 2-groups), and the groups 
UT(3, p) in Example 2.148 consisting of all upper triangular 3x3 matrices 

over W p of the form 01 c j. The obvious generalization of UT(3, p) gives an 
interesing family of nonabelian p-groups. 


Definition. If k is a field, then an n x n unitriangular matrix over k is an upper 
triangular matrix each of whose diagonal terms is 1. Define UT (n , k) to be the 
set of all n x n unitriangular matrices over k. 

Proposition 6.27. UT (n , k) is a subgroup of GL(«, k) for every field k. 

Proof If A e UT in . k ), then A = I + N, where N is strictly upper triangular; 
that is, N is an upper triangular matrix having only 0’s on its diagonal. Note that 
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the sum and product of strictly upper triangular matrices is again strictly upper 
triangular. 

Let e \, . . . , e n be the standard basis of k n . If N is strictly upper triangular, 
define T \ k n ^ k n by 7' (e t ) = Net, where e\ is regarded as a column matrix. 
Now T satisfies the equations, for all i, 

T(e i) = 0 and T(e i+ i) e (e 1; . . . , eft . 

It is easy to see, by induction on i, that T' (ej) = 0 for all j < i. It follows that 
T" = 0 and, hence, that N n = 0. Thus, if A e UT(«, k ), then A = I + N, 
where N n = 0. 

We can now show that UT(«, k) is a subgroup of GL(«, k). First of all, if A 
is unitriangular, then it is nonsingular. In analogy to the power series expansion 

1/(1+*) = 1— x+x 2 — x 3 -\ , we try B = I — N+N 2 — N 3 + - • • as the inverse 

of A = I+N (we note that the matrix power series stops after n — 1 terms because 
N n = 0), The reader may now check that BA = /; therefore, A is nonsingular. 
Moreover, since N is strictly upper triangular, so is —N + N 2 — N 2 + • • • , so 
that e UT(n, k). Finally, (/ + N)(I + M) = I + (N + M + N M) is 
unitriangular, and so UT(n, k ) is a subgroup of GL(n, k). • 

Proposition 6.28. Let q = p e , where p is a prime. For each n > 2, UT(/i, ) 
is a p-group of order q n ^ n ~' 2> ' 2 . 

Proof. The number of entries in an n x n unitriangular matrix lying strictly 
abovethe diagonal is j(n 2 — n) = n ( n — l)/2. Since each of these entries can be 
any element of F 9 , there are exactly q nin ~ 1 ) ! 1 n x n unitriangular matrices over 
, and so this is the order of UT(7t. F 9 ). • 

In Exercise 2.1 1 1 on page 204, we showed that UT(3, F 2 ) = /+• 

Recall Exercise 2.38 on page 143: if G is a group and x 2 = 1 for all x e G, 
then G is abelian. We now ask whether a group G satisfying x 1 ’ = I for all 
* e G, where p is an odd prime, must also be abelian. 

Proposition 6.29. If p is an odd prime, then there exists a nonabelian group G 
of order p 3 with x p = 1 for all x e G. 

Proof. If G = UT(3, V p ), then |G| = p 3 . If A e G, then A = I + N, where 
N 3 = 0; hence N p = 0 because p > 3. Since IN = N = NI, the binomial 
theorem gives A p = (/ + N) p = I p + N p = I. • 

In Exercise 6.9 on page 486, we defined u/,(G) to be the number of elements 
of order k in a finite group G, and we proved that if G and H are abelian groups 
with v/c(G) = 1 >k(H) for all k. then G and H are isomorphic. This is false in 
general, for if p is an odd prime, then each of UT(3, ¥ p ) and I^xl^xl^ consist 
of the identity and p 3 — 1 elements of order p. 
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Theorem 6.30. Let denote the finite field with q elements. Then 

|GL(ra, F,)| = (q n - 1 )(q n - q)(q n - q 2 ) ■ ■ ■ (q n - q n ~ l ). 

Proof. Let V be an n-dimensional vector space over . We show first that 
there is a bijection <f> : GL(n, F ? ) — »• B, where B is the set of all bases of V. 
Choose, once for all, a basis e \, . . . , e n of V. If T e GL(«, F (([ ), define <b( T) = 
Te\, . . . ,Te n . By Lemma 4.76, <L( T) e B because T, being nonsingular, car- 
ries a basis into a basis. But <I> is a bijection, for given a basis v\, . . . ,v n , there 
is a unique linear transformation S, necessarily nonsingular (by Lemma 4.76), 
with Set = Vi for all i (by Theorem 4.61). 

Our problem now is to count the number of bases v\, ... ,v n of V. There 
are q” vectors in V, and so there are q" — 1 candidates for iq (the zero vector is 
not a candidate). Having chosen v\, we see that the candidates for V 2 are those 
vectors not in (iq), the subspace spanned by iq ; there are thus q n — q candidates 
for t> 2 - More generally, having chosen a linearly independent list iq , . . . . u/ , then 
Vj + 1 c an be any vector not in ( v \ , ... , tq ) . Thus , there arc q" — q' c andidates for 
1 . The result follows by induction on n. • 

Corollary 6.31. |GL(/i,F 9 )| = q n(n ~ l) l 2 {q n -\)(q n ~ { -\) • • • (q 2 -l)(q-l). 
Proof. The number of powers of q in the formula 

| GL(n, F,)| = (q n - 1 )(q n - q)(q n - q 2 ) ■ ■ ■ (q n - q"- 1 ) 

is ^ 1+2H an( j i 2 + ••• + («— 1) = \n{n — 1). • 

Theorem 6.32. If p is a prime and q = p m , then the unitriangular group 
UT(n, F 9 ) is a Sylow p-subgroup of GL (n, F 9 ). 

Proof Now |UT(n, F 9 )| = q n(n ^ ]) / 2 (q n - 1 )(q n ~ l - 1) • • • (q 2 - l)(q - 1), by 
Corollary 6.31, so that the highest power of p dividing | GL(n, F 9 )| is q " {n ^ 1 ’/ 2 . 
But |UT(n, F 9 )| = q n(n ~ ] H 2 , by Corollary 6.28. and so UT(«, F 9 ) must be a 
Sylow p-subgroup. • 

Corollary 6.33. If p is a prime, then every finite p-group G is isomorphic to a 
subgroup of the unitriangular group UT(/n, W p ), where m = |G|. 

Proof. We show first, for every m > 1, that the symmetric group S m can be 
imbedded in GL(/« , k), where k is a field. Let V be an m-dimensional vector 
space over k, and let iq, . . . , v m be a basis of V. Define (p : S m — >• GL(V) by 
o T a , where T a : iq v a (i) for all i. It is easy to see that (p is an injective 
homomorphism. 
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By Cayley’s theorem, G can be imbedded in Sc', hence, G can be imbedded 
in GL(m, Fp), where m = |G|. Now G is contained in some Sylow /^-subgroup 
P of GL(m, F p ), for every p-subgroup lies in some Sylow /^-subgroup. Since 
all Sylow /^subgroups are conjugate, P = a (UT(m, Fp)) a -1 for some a e 
GL (m, Fp). Therefore, 

G = aGa~ l < a -1 Pa < UT(m, Fp). • 

A natural question is to find the Sylow subgroups of symmetric groups. This 
can be done, and the answer is in terms of a construction called wreath product. 

There are other directions in group theory. In an amazing collective effort 
at the end of the twentieth century, all finite simple groups were classified. We 
quote from The Classification of the Finite Simple Groups, by D. Gorenstein, R. 
Lyons, and R. Solomon. 

The existing proof of the classification of the finite simple groups 
runs to somewhere between 10,000 and 15,000 journal pages, spread 
across some 500 separate articles by more than 100 mathematicians, 
almost all written between 1950 and the early 1980’s. Moreover, 
it was not until the 1970’s that a global strategy was developed for 
attacking the complete classification problem. In addition, new sim- 
ple groups were being discovered throughout the entire period, ..., 
so that it was not even possible to state the full theorem in precise 
form ... until the early 1980’s. ... Considering the significance of 
the classification theorem, we believe that the present state of affairs 
provides compelling reasons for seeking a simpler proof, more co- 
herent and accessible, and with clear foundations. ... The arguments 
we give ... will cover between 3,000 and 4,000 pages. 

There is now a list of every finite simple group, and many important prop- 
erties of each of them is known. Many questions about arbitrary finite groups 
can be reduced to problems about simple groups. Thus, the classification theo- 
rem can be used by checking, one by one, whether each simple group on the list 
satisfies the desired result. 

Another important direction is representation theory - the systematic study 
of homomorphisms of a group into groups of nonsingular matrices. One of the 
first applications of this theory is a theorem of Burnside: every group of order 
p m q n , where p and q are primes, must be solvable. 


Exercises 


* 6.15 Prove that S 4 has more than one Sylow 2-subgroup. 
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*6.16 Give an example of a fi nite group G having 3 Sylow p-subgroups (for some prime 
p) P , 2 and R such that P fl Q = {1} and P fl R {1}. 

6.17 Prove that every fi nite p-group is solvable. 

*6.18 ( Frattini argument ) Let K be a normal subgroup of a finite group G. If P is a 
Sylow p-subgroup of K for some prime p, prove that 

G = KN g (P), 

where KNq(P) = {ab : a e K and b e N G (P)}. 

6.19 Show that a Sylow 2-subgroup of S(, is isomorphic to x 1 2 . 

6.20 Let 2 be a normal p-subgroup of a fi nite group G. Prove that Q < P for every 
Sylow p-subgroup P of G. 

6.21 For each prime divisor p of the order of a fi nite group G, choose a Sylow p- 
subgroup Q p . Prove that G = (^[j p Qp'j. 

6.22 (i) Let G be a fi nite group and let P be a Sylow p-subgroup of G. If // < G, 

prove that HP /H is a Sylow p-subgroup of G / II and II fl P is a Sylow 
p-subgroup of H. 

(ii) Let P be a Sylow p-subgroup of a fi nite group G. Give an example of a 
subgroup H of G with H fl P not a Sylow p-subgroup of H . 

6.23 Prove that a Sylow 2-subgroup of A 5 has exactly 5 conjugates. 

6.24 Prove that there are no simple groups of order 300, 312, 616, or 1000. 

6.25 Prove that if every Sylow subgroup of a fi nite group G is normal, then G is the 
direct product of its Sylow subgroups. 

6.26 For any group G, prove that if H < G, then Z( II) < G. 

*6.27 If p is a prime, prove that every group G of order 2 p is either cyclic or isomorphic 
to D lp . 

6.28 IfO </•<«, defi ne the q-binomial coeffi cien i[ " ] to be the number of linearly 
independent /'-lists in (F ? )". 

(i) Prove that 

["] q [n-r\ = [:\- 

(These coeffi cients arise in the study of hypergeometric series.) 

(ii) Prove that there are exactly [ n l r ] r-dimensional subspaces of (F ? )" . 

(iii) Prove that 

r „1 = (q n - Dig"- 1 — !)•••(<? — 1 ) 

L r (q r - 1 )(q r ~ l - 1 )•••(«- 1 )(q"- r - l )(q”- r - 1) • • • (q - 1) ' 

(iv) Prove the analog of Lemma 1.14: 

["], = [":}], + /["7 1 V 

(v) Prove the analog of Exercise 1.29 on page 33: 
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6.29 Find Z(UT(3, F,)) and Z(UT(4, F g )). 

6.30 (i) Prove that UT(«, F ? ) has a normal series 


UT(»t, F, ) = Go > Gi > • • • > G n = {1} 


in which G, = LIT (i, ¥ q ) consists of all unitriangular n x n matrixes 
having all entries on the i superdiagonals 0. For example, if n = 5, then 


G i consists of all matrices of the form 


matrices of the form 


r 1 o o * *■ 
o 1 o o * 
ooioo 
oooio 
ooooi- 


1 0 * * * 

0 1 0 * * 

0 0 10* 

0 0 0 1 0 

0 0 0 0 1 -1 


and G 2 consists of all 


(ii) Prove that the factor groups G,_i/ G, are abelian for all i > 1. 


6.3 Ornamental Symmetry 

In Section 2.3, we defined an isometry of the plane to be a distance-preserving 
function (p : ffi 2 — »■ Hr, and we saw, in Proposition 2.59, that Isom(R 2 ), the set 
of all the isometries of the plane, is a group under composition. For any subset 
<2 of the plane, its symmetry group is defined by 

£(£2) = {<p e Isom(R 2 ) : cp(Gl) = f2}. 

For example, we saw, in Theorem 2.63, that the dihedral group Di n is isomorphic 
to the symmetry group of a regular n-gon 52. In this section, we will study 
symmetry groups of certain designs called friezes. Our discussion follows that 
in Burn, Groups : A Path to Geometry. 

We defined three types of isometry in Example 2.55: rotations, reflections, 
and translations (but see Theorem 6.42; there is a fourth type). Identify the plane 
M 2 with the complex numbers C via {a, b) a + ib. Thus, every point (x, 0) 
is identified with the real number x (in particular, the origin is identified with 0), 
and the x-axis is identified with BL We will use the notation e' e for numbers on 
the unit circle instead of the normalized notation e 27TI " . This identification of M 2 
with C enables us to give simple algebraic formulas for isometries. 

Example 6.34. 

(i) Rotation about the origin by 6 is the function Rg sending the point with 
polar coordinates (r, a) to the point with polar coordinates (r, 6 + a). This 
isometry can now be written as Re: z e' e z, for if z = re m , then 

R e (z) = e w z = e W re ia = re i{6+a) . 
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(ii) An isometry pg is a reflection if there is a line l, called its axis, each 
of whose points is fixed by pi, which is the perpendicular-bisector of all 
segments having endpoints z, pg(z). In particular, reflection about the x- 
axis sends each point (a, b) to (a, — b); this is complex conjugation a : z = 
a + ib i-* a — ib = z. 

(iii) Translation by a vector c is r c z + c. Remember that the identity 

z i— > z is a translation; it is the only translation having a fixed point. -4 

Recall, if (p is an isometry, that (p(t) is a line whenever t is a line , and that 
cp(C) is a circle whenever C is a circle. In more detail, iff = L[P. Q] is the line 
determined by distinct points P and Q, then i p(L[P, Q ]) = L[(p(P), <p{Q)], by 
Lemma 2.56; if C = C[P\ P Q | is the circle with center P and radius P Q, then 
y{C[P-PQ}) = C[q>{Py, ( p{P)q>{Q)}. 

Here is a geometric lemma. 

Lemma 6.35. Let A, P, Q be distinct points in the plane, let C = C [ P ; PA] 
be the circle with center P and radius PA, and let C' = C[Q; QA] be the circle 
with center Q and radius QA. Then C D C' = {A} if and only if A, P , Q are 
collinear. 

Proof We use analytic geometry. Draw P and Q as the points 0 and 1 on the x- 
axis, and let A = (a, b). The equation of C is x 2 +y 2 = |RA | 2 = a 2 +b 2 , and the 
equation of C' is (x — l ) 2 + y 2 = \QA\ 2 = (a — l) 2 +fl 2 . If B = ( p,q ) e CDC', 
then there are equations 

p 2 + q 2 = a 2 + b 2 and (p — l ) 2 + q 2 = (a — l ) 2 + b 2 . 

Hence, 

(p - l ) 2 + (a 2 + b 2 - p 2 ) = (a - l ) 2 + b 2 . 

Simplifying, we get p = a and q = ±b. If b f=- 0. then there are two points in 
C D C’ . Hence, if there is only one point in C D C’, then b = 0. But this one 
point must be A, so that A = (a, 0). Thus, A lies on the x-axis, and A, 0, and 
1 are collinear. Conversely, if C IT C’ has more than one point, then C fl C' = 
{A, B } 7 ^ {A}. Thus, B = (a, — b ) 7 ^ (a, b) = A, so that b 0 and, hence, 
A, 0, and 1 are not collinear. • 

Proposition 6.36. Let <p : C — >• C be an isometry fixing 0. 

(i) There is some 0 with tp{ 1) = e l °. If tp(l) = 1, then tp fixes the x-axis 
pointwise, and <p is either the identity or complex conjugation. 

(ii) If tp( 1) 7 ^ 1, then <p is either a rotation or a reflection. In more detail, 
cp: z H>- e l ° z when q> is a rotation, and <p\ z i-> e ,e ~z when <p is a reflection. 
In the latter case, the axis ofcp is i = [re l0 ^ 2 : r e R}. 
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In both cases, (p £ O 2 (R) and (p is a linear transformation. 

Proof. 

(i) Let z e R be distinct from 0, and let C\ z \ be the circle with center 0 and radius 
\z\ = |0z|. Since (p is an isometry fixing 0, we have <p(C\ z \) = C\ z \, for isometries 
take circles to circles of the same radius: tp{C[ 0; 0" | ) = C [ (p (0) ; (p(i))(p(z.)\ = 
C[0; 0^(z)]- In particular, 1 e Ci implies that ^(1) e C\, and so cp( 1) = e' e for 
some 9. 

Assume that <p also fixes 1, and let z e R be distinct from 0,1. If C = 
C[0, z] and C' = C[l, z], then <p(C) = <p(C[ 0, z]) = C[0, <p(z)] = C, because 
|0z | = \(p{Q)(p{z)\ = |(ty(z)|; similarly, <p(C') = C'. Since 0, 1, z are collinear, 
Lemma 6.35 gives jz) = C n C’ . Hence, 

Wiz)} = <p(C n c') = epic ) n #>(c') = c n c’ = {z}. 

Therefore, (p fixes M pointwise. 

If z ^ K, let C be the circle with center 0 and radius Oz and let C' be the circle 
with center 1 and radius lz. Now cp(C H C') = <p(C ) fl (p{C) = C D C' . But 
Lemma 6.35 says that CflC' = {z, z}, so that either tp(z) = z or <p(z) = z. 
If (p(z) = z for some z ^ K, then <p fixes the basis 1, z of the vector space R 2 
and, hence, <p is the identity (because <p is a linear transformation, by Proposi- 
tion 2.57). Therefore, if <p is not the identity, then (p(z) = z for all z. 

(ii) If xf is rotation about 0 by 6 , then if ~ l (p is an isometry fixing both 0 and 1. By 
part (i), 1 // ~ 1 <p is either the identity or complex conjugation; that is, <p(z) = e l6 z 
or tp(z) = e w z. 



If cp( 7 ) = e ,0 z, then Example 6.34(i) shows that (p is a rotation. 
If ip : z e' e z, then 

(p{re t9 ^ 2 ) = e ,e (p(re ,e l 2 ) = re ,e e~ ,e ^ 2 = re l9 ^ 2 , 
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so that every point on t is fixed by <p. If z = re' a £ l, then < p(z) = re' (6 ~ a) . In 
Figure 6.1, the intersection of the line L = L[z, <p(z) \ with i is denoted by A, 
and the intersection of L with the v-axis is denoted by U . Let us see that i bisects 
ZzO <p(z). Now L U0(p(z ) = 6 — a, so that LzO(p(z) = 6—2 a = 2(6 — ^a); 
hence, Z<p(z)0A = ^6 — a = ZzOA. Thus, AzOA is congruent to A(p(z)0A, 
because \0<p(z)\ = r = |0z|, and so \<p(z)A\ = |Az|. Finally, t is perpendicular 
to L = L[cp(z), z], for Z 0 A<p(z) = L OAz and their sum is 180°. Therefore, <p is 
a reflection with axis t. • 

Having classified all isometries that fix 0, let us now investigate arbitrary 
isometries. 


Corollary 6.37. Ifq) is an isometry with cp( 0) = c, then there is some 6 so that 

(p(z) = e' e z + c or (p(z) = e ,e z + c. 

Proof. If ip is a translation, say, (p: z ^ z + c, then <p has the formula <p(z) = 
e ld z + c with 6 = 0. In general, given i p\ z e ,e z + c, define r to be translation 
by c = (p( 0). Now x~ V is an isometry fixing 0, and so it is either a rotation or 
a reflection, by Proposition 6.36. • 

The isometry z h>- e ,e + c is easily seen to be rotation about c by 6. Your 
first guess is that isometries of the form z e ,6 z + c are reflections, but the next 
proposition shows that this is not always true. 

Recall that 6 is the direction of a nonzero complex number z = re' 6 . Every 
line l has an equation of the form z = v + re' 6 , where re R; we say that t has 
direction 6. 


Proposition 6.38. The following statements are equivalent for an isometry with 

, ;n 

equation <p: z h* e z + c. 

(i) qr = identity. 

(ii) e' e c + c = 0. 

(iii) ip has a fixed point. 

(iv) <p has a line l comprised of fixed points, and l has direction 6/2. 

(v) <p is a reflection. 

Proof. 



502 


Groups II Ch. 6 


(i) =>• (ii) 

cp 2 (z) = (p(e' e z + c ) 

_ e ,e ( e i8j + c) + C 
= e l9 (e~ l9 z + c) + c 
= Z + e ,6l c + c. 

Hence, <p 2 is the identity if and only if e ,0 c + c = 0. 

(ii) (iii) Since <p is a reflection, the midpoint ^(z + <p(z,)) of the segment 
with endpoints z and <p(z) lies on the axis of <p and, hence, it is fixed by <p. In 
particular, the point is hxed [being the midpoint of 0 and <p{ 0) = c]. Indeed, 
xp(^c) = e 10 jc + c = j ( e l9 c + c) + \c = \c, because e l0 c + c = 0. 

(iii) =>• (iv) Suppose that xp(u) = u. Let £ be the line £ = {u + re 1 0 ! 2 : r e M}; 
it is clear that t has direction 0/2. If z e £ , then 

<p(z) = (p(u + re ,6/2 ) 

= e l6 (ii + re 10 ! 2 ) + c 
= e w u+re w e- ie/2 + c 
= (e i9 u + c) + re i9/2 
= <p{ u) + re' 9 ! 2 
= u+re i9 l 2 
= z. 

(iv) =>• (v) It suffices to show that (p is a reflection with axis £; since <p fixes 
every point on l, it suffices to show, for each z / t, that £ is the perpendicular- 
bisector of the segment with endpoints z, (p(z). If xjs : z e' 9 z, then we saw, in 
Proposition 6.36, that xjs is the reflection with axis £' = {re 10 / 2 : r e 1). Hence, 
£! is the perpendicular-bisector of each segment with endpoints z — ^c, x/r (z— \c). 
If we define r to be translation by ^c, then £ = r (£') is the perpendicular-bisector 
of the segment with endpoints r (z — Jrc), t(x//(z — ^c)). But r(z — jc) = z, and 

r (xfr(z ~ 2 C )) = e ' 6 (z ~ 5 C ) + \ c 
= e 10 z— 2 e ' e c+ \c 
= [e l0 z + c] - c - \e l9 c + \c 
= <P(z) ~ \{e I0 c + c) 

= <p(z). 




(v) =>• (i) The square of a reflection is the identity. 
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Example 6.39. 

We observe that reflections and translations need not commute. Let a : z z 
be complex conjugation, and let r : z z + i be translation by i. Now <jt(z) = 
z + i =z — i, while rcr(z) = z + i. ◄ 

Let us now analyze isometries <p: z e' e z + c that are not reflections. 


Proposition 6.40. If (p : z ^ e ,e z + c is not a reflection, then <p = r p, where p 
is a reflection, say, with axis I, and r is a translation z *—>■ Z + \w, where w has 
direction that of I. 

Proof As in the proof of (i) =>• (ii) in Proposition 6.38, we have <f 2 (z) = 
z + e l9 c + c. We define w = e l0 c + c, so that 

(p 2 :z^z + w. (1) 


Now define 

so that t 2 = (p 2 . 
Note first that 


t : z z + , in, 


e l6 w = e‘ e (e l9 c + c) = w. 


(2) 


It follows that w has direction \ 9 : if w = re ia , then substituting w = e ,6 w in 
Eq. (2) gives re m = re ld e~ m . Hence, e 2ia = e' e , so that a = \d. 

We claim that r commutes with <p. 

<P(t(z)) = <p(z + \ w) 

= e w (z + \ w) +c 
= e ,e z + c + \e ,e ui 

= <p(z) + 

= t(<p(z)). 


It follows that <p commutes with r " 1 : 

( pr _1 = = r _ 1 (^r)r _1 = x~ l (p. 

But r 2 = <p 2 , so that 

(r _ V ) 2 = ( r~ l ) 2 (p 2 = identity. 

Hence, if we define p = r ~ l (p, then p 2 = identity and 

p(z) = rip(z) = e w z + (c + \ w). 

Proposition 6.38 now says that p is a reflection whose axis has direction ^6, 
which we have already observed is the direction of in. • 
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Figure 6.2 Glide Refection 


Definition. An isometry <p is a glide-reflection if <p = x v p, where p is a re- 
flection with axis l and x v is a translation with v having the same direction as £. 
Thus, 

<p(z) = e l6 z + v = e ,e z + re ,e ^ 
for some nonzero rel 

Glide-reflections are precisely the isometries described in Proposition 6.40. 
We note that glide-reflections (p are not reflections, for cp- is not the identity. 

Example 6.41. 

The isometry ip\ z 7 + I is a glide-reflection taking the v-axis to itself: 
^(M) = BL If A is the triangle with vertices (0, 0), (^, 0), (1, 1), then <p{ A) is 
the triangle with vertices (1, 0), (|, 0), (2, — I ), (p 2 ( A) has vertices (2,0), (|, 0), 
(3, 1), and <p n { A) has vertices (n, 0), , 0), (n + 1, (—1)”). The design 

in Figure 6.2, which goes on infinitely to the left and to the right, is invariant 
under <p. A 

The following statement summarizes our work so far. 

Theorem 6.42. Every isometry is either a translation, a rotation, a reflection, 
or a glide-reflection. 

Proof. The theorem follows from Proposition 6.36(ii), Corollary 6.37, Propo- 
sition 6.38, and Proposition 6.40. • 

Proposition 6.43. Let (p e Isomfffi 2 ). 

(i) lfq> has no fixed points, then cp is either a translation or a glide-reflection. 
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(ii) If(p has only one fixed point, then <p is a rotation. 

(iii) If (fi has more than one fixed point, then <p is either a reflection or the 
identity. 

Proof. There are only four types of isometry, by Theorem 6.42: translations, 
which have no fixed points; rotations, which have exactly one fixed point; reflec- 
tions, which have infinitely many fixed points, namely, every point on their axes; 
glide -reflections. It suffices to show that a glide -reflection cp has no fixed points. 
If (p(z) = z, then qr(z) = z; but (p 2 = r, where r f= identity is a translation, by 
Eq. (1), contradicting the fact that translations have no fixed points. • 


Example 6.44. 

We use Theorem 6.42 to determine the elements of finite order in Isom® 2 ). 
Translations (other than the identity) have infinite order, as do glide-reflections 
(for the square of a glide-reflection is a translation). All reflections have order 2. 
Finally, suppose that <p : z ea- e ,e z + c is a rotation (about c). By induction, we 
see that 

<p n (z) = e nW z + c( 1 + e w + e 2W + • • • + e {n ~ l)ie ). 

Hence, if <p n = identity, then we must have 6 = 2jt /??.; in this case, <p n (z ) = 
z + c(l + e ,e + e 2 ' e + • • • + g ut now e i0 j s an roo t 0 f un ity, and so 

1 + e ,e + e 2ltj + ■ ■ ■ + 1 )Itj = 0. Therefore, if <p n is the identity, then we must 

have c = 0; that is, < p(z) = e 27T, ^ n z. Conversely, if 0 = 2n/n, then z e ,e z 
has finite order. 

Are there any elements of order 2 besides reflections? Such an isometry ip 
must have the form z ea- e ni z + c\ that is, <p(z) = — z + c; it is called a half-turn. 
Note that half-turns are not reflections, for a reflection has infinitely many fixed 
points while a half-turn, being a rotation, has only one fixed point. A half-turn 
reverses the direction of a line. For example, ip : z e^ — z + 2 takes 

to 

• • • < [ < [ < [ < [ < • 

The reader should check that a half-turn < p turns a figure upside down. For ex- 
ample, (p(v) = A and cp( a) = v. ◄ 


Recall that if zi, . . . , z n are distinct points in C, then their center of gravity 
is u, where 

u = ^(zi + • • • + z„). 
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Lemma 6.45. Let tp e Isom(M 2 ), and let zi , . ■ ■ , Z n be distinct points in C. 
Then tp(u) = u', where u is the center of gravity of z l, . . . , Zn and if is the 
center of gravity oftp(zi), . . . , <p{z n )- 

Proof By Theorem 6.42, tp is either a translation, a rotation, a reflection, or 
a glide -reflection. A rotation about c is a composite rp, where r is the trans- 
lation z i — ^ z c and p is a rotation around 0. Proposition 6.40 shows that a 
glide -reflection is also a composite of a translation and a reflection, while ev- 
ery reflection is a composite of a translation and a reflection whose axis passes 
through 0. We conclude that it suffices to show that tp(u ) = if for tp a translation, 
a rotation about the origin, or a reflection with axis passing through the origin 
(so that 0 is fixed in either case). 

Suppose that tp is a translation: tp{z) = z+ a. Then 

tp( u) = u + a 

= + • • • + Z z ) + a 

= 7iZi H 1“ TZn + H h 

= +«) + ••• + ~ (Zn + a) 

= \tp{z\) H h j;(p(z„) 

= if . 

If tp is either a rotation about the origin or a reflection with axis through 0, 
then Proposition 6.36 shows that tp is a linear transformation. Therefore, 

T (it) = (P{\[Z\ H h Z„]) = ^[^(Zl) H h <P(z n )\ = u’ ■ • 

Lemma 6.46. If G < Isom(M 2 ) is a finite subgroup, then there is u € C with 
tp(u ) = u for all tp e G. 

Proof Choose z e C, and let O be its orbit: 

C> = [tpiz) :tpeG). 

Since G is finite, O is finite: O = {zi , . . . , z„}, where z.\ = z. Now G acts on O, 
for if tjr e G, then if(zj) = fstp(zi) e O, because iftp e G. Thus, each tp e G 
permutes O, for tp is injective and tp: O — > O. Since tp permutes O, the center 
of gravity u of O is equal to the center of gravity of tp(O) = O. Therefore, 
Lemma 6.45 says that <p(u) = u for all tp e G. • 

In his book Symmetry, H. Weyl attributes the following theorem to Leonardo 
da Vinci (1452-1519). 
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Theorem 6.47 (Leonardo). IfG < Isom(R 2 ) is a finite subgroup, then either 
G = Imfor some m or G = l-Hn for some n. 

Proof. By Lemma 6.46, there is c e C with <p(c) = c for all cp e G. If 
z : z z — c, then t(pT~ l (0) = rcp(c) = r(c) = 0. Since rGr -1 = G, we 
may assume that every (p e G fixes 0. Thus, Proposition 6.36 applies: we may 
assume that G < G 2 (M), and so every <p e G is a linear transformation. Better, 
we may assume that every (p e G is either a rotation or a reflection. 

Suppose that G contains no reflections. Thus, the elements of G are rotations 
Rg l , , Rft m , where 9j = 2nkj /ray, by Example 6.44. If n = maxy {nj }, then 
G < {Rijr/n). Therefore, G, being a subgroup of a cyclic group, is itself cyclic. 

Suppose that G contains a reflection p. By Exercise 6.39 on page 515, we 
may replace G by an isomorphic copy which contains o, complex conjugation. 
The subset H, consisting of all the rotations in G, is a subgroup; it is a finite 
subgroup of Isom(R 2 ) containing no reflections, and so it is cyclic, say, H = (/;), 
where h(z) = e l6 z has order n, say. Now erfier -1 = , for 

crhc r -1 : z i — e z i — e e‘ e z e- > e ,e z = e~' e z = h~ l (z). 

Therefore, (h,cr) = HU Ho is a subgroup isomorphic to Di n . Finally, we 
claim that ( h , o) = G. If r e G is a reflection, then r(z) = e ia z = R a o(z). But 
R a = ro~ [ e H, for it is a rotation in G, and so r = R a o e { h , a). • 

Leonardo’s theorem has found all the finite subgroups of (GCR) that stabilize 
a point. We are now going to find all those subgroups of Isom(R 2 ), called frieze 
groups, that stabilize a line but not a point. To isomorphism, there are only four 
such subgroups, but when we take geometry into account, we shall see seven 
types of them. 

According to the Oxford English Dictionary, a frieze is “that member in the 
entablature of an order which comes between the architrave and the cornice.” 
Fortunately, it goes on to say that a frieze is “any horizontal broad band which 
is occupied by a sculpture.” Now sculptures are three-dimensional, but we use 
the word to mean any (two-dimensional) broad band which repeats some pattern 
infinitely to the left and to the right. In more precise language, we say that a 
subset F of the plane is a band if F there is some isometry (p (not the identity) in 
the symmetry group E(F) which stabilizes a line £; that is, (p(i ) = t (we do not 
insist that <p fix i pointwise). To say that a band F is a frieze means that there is 
some “design” D c Fso that F = U«ez T "(^) f° r some translation r e E(F). 
We aim to classify all those subgroups of Isom (M 2 ) of the form Y.(F) for some 
frieze F. 

The band F in Figure 6.3 is a frieze: it is stabilized by the translation r : z h>- 
z + 1 , and its repeating pattern is the triangle D having base the closed interval 

[o, U 
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Consider the band F' in Figure 6.2. It is easy to see that its symmetry group, 
£(F'), contains the glide -reflection : z z + 1. Note that ^(M) = M and that 
F' = U ne z where D is the triangle with base [0, 4 1. This does not show 

that F' is a frieze because < p is not a translation. However, F' is, indeed, a frieze, 
for the translation r : z z + 2 lies in S(F') and F' = U„ e z where 

D' is the union of the triangle with base [0, 4] and the triangle with base [1, 41 - 



Figure 6.4 Persian bowmen 


Now consider a frieze F" obtained from F in Figure 6.3 by replacing the 
triangle D with base [0, by another figure. For example, let F" be the frieze 
in Figure 6.4 (from the palace of Darius in ancient Susa) in which each triangle D 
in Figure 6.3 has been replaced by a Persian bowman. It is clear that Y(F") = 
£(F). Plainly, there are too many friezes to classify them geometrically; for 
example, what restrictions, if any, must be imposed on D? In spite of this, we 
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are still able to classify friezes if we do not distinguish triangles and bowmen. 

Notation. The subgroup of all the translations in Isom(R 2 ) is denoted by 
Trans(R 2 ). 

Informally, a frieze group is the symmetry group of a frieze. We will soon 
replace the following definition by a normalized version. 

Definition. 1. A frieze group is a subgroup G of Isom (M 2 ) which stabilizes a 
line l, that is, (p{i) = l for all ip € G. and such that G D Trans (R 2 ) is infinite 
cyclic. 

Saying that each ip e G stabilizes a line I reflects the fact that a frieze is a 
band; saying that G D Trans(R 2 ) = (r) is infinite cyclic reflects the fact that a 
frieze F has some repeating design D C F whose (r)-orbit is all of F. 

Lemma 6.48. If (p e G, where G is a frieze group, then there is some real 
number c such that one of the following holds : 

(i) If (p is a translation, then cp(z) = Z + c. 

(ii) If ip is a rotation, then <p is a half-turn: ip(z) = —Z + c. 

(iii) If cp is a reflection, then ip(z) = z or ip{z) = —z + c. 

(iv) If ip is a glide-reflection, then ip: z ^ Z + c, where c 0. 

Proof We know that ip: z e' e z + c or <p : z m*- e' e z + c. Since <p(R) = R, 
we have c = ip(0) e R and <p( 1) = e‘ e + c e R Therefore, e 1 9 e R; that is, 
e' e = ±1. Thus, either <p(z) = ±z + c or <p(z) = + c. 

The remainder of the proof determines the type of isometry corresponding 
to each of these formulas. Rotations by 6 have the form e ld z + c; since e ,e = 
±1, we must have 6 = n, and so rotations here are half-turns. The isometry 
ip : z i — ^ e l6 z + c is a reflection if and only if e l0 c + c = 0. Here, c is real, so 
that c = c, and so <p is a reflection if either c = 0 or if e ih = — 1. Thus, either 
<p(z) = z or ip(z) = —z + c for any cel Finally, if c 0 and <p(z) = z + c, 
then e ,e c + c = 2c 0 and ip is a glide -reflection. • 

We are going to normalize the classification problem for frieze groups, in two 
ways. First, there is no loss in generality in assuming that the stabilized line t is 
the real axis R for we may change the location of the coordinate axes without 
disturbing symmetries. Second, we will ignore changes in scale. For example, 
the frieze F in Figure 6.3 has an infinite cyclic symmetry group, namely, E( F) = 
(r), where r is the translation r : z h>- z + 1. On the other hand, if each vector 
in R 2 is doubled in size, then F becomes a new frieze <t> with E(d>) = (x'), 
where x' : z z + 2. Thus, F and 4> are essentially the same frieze, differing 
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only in scale, but their symmetry groups are different because r ^ X (<J>). Define 
co: R 2 — > R 2 by co(z) = 2 z. Now u> defines the isomorphism X(F) — »• X(d>) 
given by <p p^ axpco -1 . Note that 

mxco~ l : z \z \z + 1 p-> 2 (\z + 1) = z + 2. 

Our second normalization assumes that the generator r of G O Trans (R 2 ) is the 
translation x : z i->- z + 1 • In light of the discussion so far, it suffices to classify 
normalized frieze groups. 

Definition. 2. A normalized frieze group is a subgroup G < Isom(R 2 ) which 
stabilizes R and such that G 0 Trans (R 2 ) = (r), where r : z i->- z + 1. 

Lemma 6.48 simplifies when we assume frieze groups G are normalized. 
If y:z p-> Z + c is a glide-reflection in G, then y 2 is a translation; in fact, 
y 2 : z p->- z + 2c. But all translations in G lie in (r), so that y 2 = r" : z ^ z + n 
for some n e Z. Hence, 2c = n, so that c = m or c = m + 7 for some m e Z. 
Thus, G contains x~'"y = a if c = m, that is, <r(z) = z, or x~ m y: z z + \ if 
c = m + \. In order to distinguish y e G from or e G, we choose y : z pp>- z+ 5, 
so that y 2 = r. We may also normalize the half-turn F and the reflection p so 
that fi: zp^ -z+ 1 and p : z p->- — z + 1 • 

If (p e Isom(R 2 ), let us write ^(z) = e ,6l z € + c, where e = ±1, z 1 = z, and 
Z _1 = z. If V 7 = ^“z^ + then it is easy to see that 

<pir(z) = e' (0+a) z (r> + e'°d + c. 

It follows that the function it : Isom(R 2 ) — »■ 02(R), defined by 

<P ^ Vo)^’ 

is a homomorphism [of course, x~^ ^<p: z p^ ]. and kerrr = Trans(R 2 ), so 
that Trans (R 2 ) < Isom(R 2 ). 

Definition. Let : x : Isom(R 2 ) — »■ G2(R) be the map just defined (which erases 
the constant of translation). If G is a frieze group, then its point group is n(G). 

It follows from the second isomorphism theorem that if T = GTlTranstR 2 ), 
then T <\G and G/T = jt(G). 

Corollary 6.49. The point group Jt(G) of a frieze group G is a subgroup of 
i m tt = {1, /, g, h] < G 2 (R) ( which is isomorphic to the four- group Y), where 
f(z) = ~Z, g(z) = —Z, and h(z) = z. 
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Proof. By Lemma 6.48, we have im n = { 1 , /, g, h }. • 

We are now going to classify the (normalized) frieze groups. Since im it = 
(f, g, h ) is isomorphic to the four-group, there are exactly 5 subgroups of it: 
{1}, (/), (g), (h), (f,g’h) = inur. Thus, there are 5 point groups. We will 
use Exercise 2.90 on page 188: Let n : G — > H be a surjective homomorphism 
with ker7t = T. If H = ( X ), and, for each x e X, a lifting g x e G is 
chosen with n (g x ) = x, then G is generated by T U {g x : x € X). Here, 
T = G n Isom(M 2 ) = (r), where x : z k* z + 1. 
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Figure 6.5 Normalized Liftings 


The following group will appear in the classification of the frieze groups. 
Recall that the dihedral group D2„ is defined as a group of order 2 n generated by 
two elements a and b such that b 2 = l, a" = 1, and bab = a~ l . 

Definition. The infinite dihedral group D 0 0 is an infinite group generated by 
two elements a and b such that b 2 = 1 and bab = a~ l . 

Exercise 6.42 on page 515 shows that any two infinite dihedral groups are 
isomorphic. 

Theorem 6.50. There are at most 7 types of frieze groups G. 

Proof. We use the notation in Figure 6.6. 

Case 1. n(G) = {1}. In this case, G = G\ = (r). Of course, G\ = Z. 

Case 2. n(G) = (/). In this case, G = G 2 = (r, R). Now R 2 = 1 and 
RtR: z 1 — ^ z — 1 ; that is, RrR = r _1 . Since G 2 is infinite (because x has 
infinite order), G 2 is infinite dihedral; that is, G 2 = D a 0 . 

Case 3. n(G) = (g). In this case, G 3 = (r, p). Now p 2 = 1 and prp: z. 
z — 1 ; that is, pxp = r _1 . Therefore, G 3 = Do 0 . 

The group G 2 is also infinite dihedral, so that G 3 = G 2, by Exercise 6.42 on 
page 515; thus, G 2 and G 3 are, algebraically, the same. However, these groups 
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are geometrically distinct, for while G 2 contains only translations and half-turns, 
the group G 3 contains a reflection. 

Cases 4 and 5 . it(G) = (/;). There are two possible cases because there are two 
possible liftings of h, namely, a and y. 

Cases 4 . G 4 = (r, a). Now r and a commute, for each of ax and r a sends 
z 1 — > z. T 1 , so that G 4 is abelian. Moreover, a 2 = 1 . It follows easily from 
Proposition 2.125 that 


G 4 = {a) x (r> = I2 x Z. 

Case 5 . G 5 = (r, y). Note that y and r commute, for each of y r and r y sends 
z i-> z + §, so that G5 is abelian. Since y 2 = r, we have G5 = (r, y) = (y) 
cyclic with generator y; that is, G5 = Z. 

Algebraically, G5 and G 1 are the same, for both are infinite cyclic. But these 
groups are different geometrically, for G 5 contains a glide-reflection while G 1 
has only translations. 

Cases 6 and 7 . 7r(G) = {/, g. h). Again, there are two possible cases because 
of the two possible liftings of h. Note, in the four-group, that the product of any 
two nonidentity elements is the third such, and so both (r, R, a) and (r, R, y) 
have point group (/, gji). 

Case 6. Ge = (r, R , a). Now ax = x a, as in Case 4 , while a R = Ra : z 
— z + 1 . It follows that both (a) <Ge and (r, R) <\ Ge- Since (a) D (x, R) = { 1 } 
and G(, is generated by these two subgroups, Proposition 2.125 shows that Ge = 
(a) x (r, R). By Case 2 , (r, R) = D 0 0 , and so Ge = I2 x D 0 0 . 

Ca.?c 7 . G7 = (r, 7 ?, y). Since y 2 = r, we have G7 = ( R , y). Now 7 ? 2 = 1 
and RyR: z z — so that RyR = y -1 . Therefore, G^ = D^. 

Algebraically, G 7, G2 and G4 are the same, for each is isomorphic to 1 )^. 
But these groups are different geometrically, for neither G 2 nor G 3 contains a 
glide -reflection (lest their point group be too big). • 


Theorem 6.51. Each of the 7 possible frieze groups occurs. 

Proof Each of the friezes illustrated in Figure 6.6 has the indicated group of 
symmetries. One should view each frieze as being bisected by the x-axis, so that 
half of each letter is above the axis and half below. For example, F\ is stabilized 
by a but not by y . To prove this theorem, we consider each of the (normalized) 
isometries r, R, p, a, and y, and show that a given frieze is stabilized by some 
of these and not stabilized by the others. 

We remind the reader of the geometric view of the basic isometries. The 
translation r is a shift one unit to the right, while a is the reflection in the x-axis 
and p is the reflection in the y-axis. The glide-reflection y reflects in the x-axis 
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Figure 6.6 The Seven Friezes 


and then shifts half a unit to the right, while the half-turn F turns a frieze upside 
down. 

(i) We have E(Fi) = (r), because x(F\) = F \ , but none of the other isometries 
stabilize it. Therefore, G\ is a frieze group. 

(ii) We have E(F 2 ) = (r, F), because r, R stabilize F 2 , but p, a, and y do not 
stabilize it. Therefore, G 2 is a frieze group. 

(iii) We have E(F 3 ) = (r, p), because r, p stabilize F2, but R, a, and y do not 
stabilize it. Therefore, G 3 is a frieze group. 

(iv) We have E(Fr) = (r, a), because r, a stabilize F2, but R, p, and y do not 
stabilize it. Therefore, G4 is a frieze group. 

(v) We have E(Fs) = (r, y), because r, y stabilize F2, but F, p, and a do not 
stabilize it. Therefore, G 5 is a frieze group. 

(vi) We have Z ( Ff> ) = (r, A', or), because all the isometries stabilize it. There- 
fore, G(, is a frieze group. 

(vii) We have E/Fy) = (r, F, y) = (F, y), because all the isometries except a 
stabilize F-j. Therefore, G7 is a frieze group. • 

Corollary 6.52. To isomorphism, there are 4 f rie ^,e groups, 11 ci 1 11 e /y, T^oo> 

1 2 x Z, anr/ 1 2 x Doo. 

Proof. As stated in the proof of Theorem 6 . 50 , E(Ff) and E ( F5 ) are isomor- 
phic to Z, E(F 2 ), E(F 3 ) and E (F7) are isomorphic to D^. E(F4) is isomorphic 
to I 2 x Z, and E(F6) = I 2 x Doo. • 

The next question is the classification of the wallpaper groups. Let B r (u ) = 
(ij£l 2 : \v — u\ < r) be the open ball with center u and radius r. Of course, a 
subgroup G < Isom(M 2 ) acts on M 2 , and so the orbit O(u) of any point it e M 2 
makes sense: O (u) = \ip(u) : <p e G}. A subgroup G < Isom (M 2 ) is discrete 
if, for each u e M 2 . there is r > 0 so that B r (u ) 0 O ( u ) = {u}. One can prove 
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that frieze groups are those discrete subgroups of Isom(R 2 ) which stabilize a 
line but not a point (the point groups stabilize a point). Wallpaper groups are 
those discrete subgroups G of Isom(R 2 ) which do not stabilize a line or a point. 
If G is a wallpaper group, then the homomorphism it : G — > CM®) has kernel 
Trans (R 2 ) D G, which is now a free abelian group Z x Z. The image of n, still 
called a point group, must be one of I„, Di n , where n e {1,2, 3, 4, 6} (this is 
the so-called crystallographic restriction). We refer the interested reader to the 
final chapter of Burn, Groups : A Path to Geometry, where it is shown that there 
are exactly 17 wallpaper groups. 

Similar problems exist in 3-dimensional space. One can classify the five 
Platonic solids and give their isometry groups: the tetrahedron has symmetry 
group A4, the cube and the octahedron each has symmetry group 54, and the 
dodecahedron and icosahedron each has symmetry group A 5 . Crystallographic 
groups are defined to be the discrete subgroups G < Isom(R 3 ) which do not 
stabilize a point, a line, or a plane. There is a homomorphism G O3 (R), all 
orthogonal linear transformations on R 3 , which generalizes the homomorphism 
7t; its kernel, Trans(R 3 ) Cl G. is a free abelian group Z © Z © Z, and its image, 
a point group, is a finite subgroup of 6)3 (R). There are 230 crystallographic 
groups. 


Exercises 

6.31 (i) If <p e IsomfiR 2 ), then 1 p(z) = e' 6 z + c or ip(z) = e ,s z + c. Prove that 6 

and c are uniquely determined by (p. 

(ii) Prove that the function / : Isom(R 2 ) — »• O 2 (R), defi ned by cp 1 — ^ , 
is a homomorphism, where r^( 0 ) is the translation z i-> z + ^>(0). Prove 
that the homomorphism / is surjective and that its kernel is the subgroup 
T of all the translations. Conclude that T < Isom(R 2 ). 

6.32 Prove that (p: (x, y) i-> (x + 2, —y) is an isometry. What type of isometry is it? 

6.33 Verify the following formulas. 

(i) If r : z 1 -^ z + c, then r^ 1 : z h* z — c. 

(ii) If R : z e' e z + c, then M 1 : z (->• e~‘ e (z — c). 

(iii) If <p : zi-> e l6 z + c, then cp~ l : z. i-> e' e (z — c). 

(iv) Give an example of isometries a and /l such that a and are not 

isometries of the same type. 

6.34 (i) Prove that conjugate elements in Isom(M 2 ) have the same number of 

fi xed points. 

(ii) Prove that if <p is a rotation and x/r is a refection, then <p and xj/ are not 
conjugate in Isom(® 2 ). 

6.35 If cp and xfr are rotations in Isom(R 2 ) having different fi xed points, prove that the 
subgroup (<p, xfr) they generate is infi nite. 

6.36 If cp € Isom(R 2 ) fi xes three noncollinear points, prove that 1 p is the identity. 
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6.37 (i) Prove that the composite of two refactions in Isom® 2 ) is either a rota- 

tion or a translation. 

(ii) Prove that every rotation is a composite of two refections. Prove that 
every translation is a composite of two refections. 

(iii) Prove that every isometry R 2 — >• R 2 is a composite of at most three 
refections. 

6.38 If H denotes the subgroup of Isom(R 2 ) consisting of all isometries which stabi- 
lize R prove that complex conjugation lies in the center, Z(H). 

*6.39 (i) If p is a refection in O 2 (R), prove that there is a rotation R e Oj ®) 

with RpR~ l = a, where a(z) = z. 

(ii) If G is a subgroup of CAIR) containing a refection p, prove that there is 
a rotation R € Isom(R 2 ) with RGR~ l containing complex conjugation. 
*6.40 Prove that the composite of two refections is either the identity or a rotation. 

*6.41 Prove that if a frieze group G contains two of the following types of isometry: half- 
turn; glide-refection; refection with vertical axis, then G contains an isometry of 
the third type. 

*6.42 Prove that any two infi nite dihedral groups are isomorphic. In more detail, let 
G = { a,b ) and H = (c, d) be infinite groups in which cr = 1, aba = b~ l , 
c 2 = 1, and cdc = d 1 . Prove that G = H. 

6.43 Find the isometry group of the following friezes. 

(i) SANTACLAUSSANTACLAUSSANTA 

(ii) HOHOHOHOHOHOHOHOHOHOHO 

(iii) /\/\/\/\/\/\ 

(iv) /////// 

\ \ \ \ \ \ \ 

(V) S\/\S\S\S\S\ 

\/\/\/\/\/\/ 

(vi) / \ /\ /\ /\ 

\ / \ / w 

(vii) //////////// 

wwwwww 
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7.1 Prime Ideals and Maximal Ideals 

Our main interest in this chapter is the study of polynomials in several vari- 
ables. One sees in analytic geometry that polynomials correspond to geometric 
figures; for example, f(x, y) = x 2 /a 2 + y 2 /b 2 — 1 is intimately related to an 
ellipse in the plane K 2 . But there is a very strong connection between the rings 
k[x i, . . . , x n ], where k is a field, and the geometry of subsets of k n going far 
beyond this. Given a set of polynomials f\, ... , f t of n variables, call the subset 
V C k" consisting of their common zeros an algebraic set. Of course, one can 
study algebraic sets because solutions of systems of polynomial equations (an 
obvious generalization of systems of linear equations) are intrinsically interest- 
ing, but they do arise quite naturally. Investigating a problem often leads to a 
parametrization of its solutions by an algebraic set, and so understanding the al- 
gebraic set and its properties, e.g., irreducibility, dimension, genus, singularities, 
and so forth, leads to an understanding of the original problem. The interplay be- 
tween k[x i . . . . , x,, ] and algebraic sets has evolved into what is nowadays called 
Algebraic Geometry, and this chapter may be regarded as an introduction to this 
subject. 

As usual, it is simpler to begin by looking at the more general setting - in 
this case, commutative rings - before getting involved with polynomial rings. A 
great deal of the number theory we have presented involves divisibility: given 
two integers a and b, when does a \ b; that is, when is b a multiple of a? 
This question translates into a question about principal ideals, for a \ b if and 
only if ( b ) C (a) [if b = ac, then b € (a) and rb e (a) for all r e /?]. We 
now introduce two especially interesting types of ideal: prime ideals, which are 
related to Euclid’s lemma, and maximal ideals. 
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Let us begin with the analog of Theorem 2.121, the correspondence theorem 
for groups. Recall that if / : X — > Y is a function and B c Y is a subset, then 

f~ l (B) = {xeX: f(x)eB}. 


Proposition 7.1 (Correspondence Theorem for Rings). If I is a proper 
ideal in a commutative ring R, then the natural map tc : R — > R/I induces an 
inclusion-preserving bijection n' from the set of all intermediate ideals J ( that 
is, I C J C R), to the set of all the ideals in R/I, given by 

it ! J i — > J / 1 = {a T / 1 a £= y}. 

Thus, every ideal in the quotient ring R/I has the form J / 1 for some unique 
intermediate ideal J. 

Proof If one forgets its multiplication, the commutative ring R is an additive 
abelian group and its ideals I are (normal) subgroups. The correspondence theo- 
rem for groups, Theorem 2. 121, now applies, and it gives an inclusion-preserving 
bijection 

7T* : {all subgroups of R containing 1} — »• {all subgroups of R/I}, 
where 7T*(/) = J/I. 

If J is an ideal, then 7T*(/) is also an ideal, for if r e R and a e J, then 
ra e J , and so 

(r T I)(a T /) — ra T / £= J/I. 

Let tc' be the restriction of 77* to the set of intermediate ideals J . Now tc' is 
an injection because 7T* is a bijection. To see that it' is surjective, let J* be an 
ideal in R/I. Then 7r -1 (y*) is an intermediate ideal in R. by Exercise 3.45 
on page 248 [it contains I = 7T — 1 ({0} )], and 7r'(7r _1 (y*)) = jt~ 1 (J*)/I = 
= J*, by Lemma 2.14. Thus, if J = tt" 1 !/*), then J* = J/I. • 


Example 7.2. 

Let I = (m) be a nonzero ideal in Z. Every ideal J in Z is principal, say, 
J = (a), and (in) C (a) if and only if a \ m. By the correspondence theorem, 
every ideal in I,„ has the form ( [ <:/ 1 ) for some divisor a of m. ◄ 


Definition. An ideal I in a commutative ring R is called a prime ideal if it is a 
proper ideal, that is, / R, and ab e I implies a e / or b e I. 
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Example 7.3. 

(i) Recall that a nonzero commutative ring A’ is a domain if and only if ab = 0 
in R implies a = 0 or b = 0. Thus, the ideal {0} in A' is a prime ideal if 
and only if R is a domain. 

(ii) We claim that the prime ideals in Z are precisely the ideals ( p ), where 
either p = 0 or p is a prime. Since m and — m generate the same principal 
ideal, we may restrict our attention to nonnegative generators. If p = 0, 
then the result follows from part (i), for Z is a domain. If p is a prime, we 
show first that (p) is a proper ideal; otherwise, 1 e (p), and there would be 
an integer a with ap = 1, a contradiction. Next, if ab e ( p), then p \ ab. 
By Euclid’s lemma, either p \ a ox p \ b: that is, either a e ( p) or be (p). 
Therefore, (p) is a prime ideal. 

Conversely, if m > 1 is not a prime, then it has a factorization m = ab 
with 0 < a < m and 0 < b < m. Thus, neither a nor A is a multiple of m, 
hence neither lies in (m), and so (m) is not a prime ideal. < 

The proof in the example works in more generality. 

Proposition 7.4. If k is afield, then a nonzero polynomial pipe) £ k[x ] is 
irreducible if and only if(p(x)) is a prime ideal. 

Proof. Suppose that p(x) is irreducible. First, ip) is a proper ideal; otherwise, 
R = {p) and hence 1 e ( p), so there is a polynomial f(x) with 1 = pix)f(x). 
But p(x) has degree at least 1, whereas 

0 = deg(l) = deg (pf) = deg (p) + deg (/) > deg(p) > 1. 

This contradiction shows that ( p) is a proper ideal. 

Second, if ab e (p), then p \ ab, and so Euclid’s lemma in k[x \ gives p \ a 
or p | b. Thus, a £ (p) or b e (p). It follows that ip) is a prime ideal. 

Conversely, suppose that p (x ) is not irreducible; there is thus a factorization 

p(x) = a(x)b(x) 

with deg(o) < deg (p) and deg (b) < deg (p). As every nonzero polynomial 
g(x ) e (p) has the form g(x) = d (x ) p (x ) for some d (x ) e k[x ], we have 
deg(g) > deg(p); it follows that neither a(x) nor b(x) lies in (p), and so ip) is 
not a prime ideal. • 

Proposition 7.5. A proper ideal I in a commutative ring R is a prime ideal if 
and only if R/I is a domain. 
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Proof. Let 7 be a prime ideal. Since 7 is a proper ideal, we have 1^7 and so 
1 + 1 ^ 0 + 7 in R/I. If 0 = (a + I)(b + 7) = ab + 7, then ab e 7. Since 7 
is a prime ideal, either a e I or b e 7 ; that is, either a + 7 = 0 or £> + 7 = 0. 
Hence, R/I is a domain. The converse is just as easy. • 

Here is a second interesting type of ideal. 

Definition. An ideal 7 in a commutative ring R is a maximal ideal if 7 is a 
proper ideal and there is no ideal J with 7 C J C 7?. 

Thus, if 7 is a maximal ideal in a commutative ring R and if J is a proper 
ideal with 7 c 7, then 7 = 7. 

The prime ideals in the polynomial ring k[x \, . . . , x„] can be quite com- 
plicated, but when k is algebraically closed, Hilbert’s Nullstellensatz (Theo- 
rem 7.45) says that every maximal ideal has the form (xi — a ,x n — a n ) 
for some point {a i, . . . , a n ) e k" . 

We may restate Proposition 3.43 in the present language. 

Lemma 7.6. The ideal {0} is a maximal ideal in a commutative ring R if and 
only if R is afield. 

Proof. It is shown in Proposition 3.43 that every nonzero ideal 7 in R is equal 
to R itself if and only if every nonzero element in R is a unit. That is, {0} is a 
maximal ideal if and only if 7? is a field. • 

Proposition 7.7. A proper ideal I in a commutative ring R is a maximal ideal 
if and only if R/I is afield. 

Proof. The correspondence theorem for rings shows that 7 is a maximal ideal 
if and only if R/I has no ideals other than {0} and R/I itself; Lemma 7.6 shows 
that this property holds if and only if R/ 1 is a field. • 

Corollary 7.8. Every maximal ideal I in a commutative ring R is a prime ideal. 

Proof. If 7 is a maximal ideal, then R/I is a field. Since every field is a domain, 
R/I is a domain, and so 7 is a prime ideal. • 

Example 7.9. 

The converse of the last corollary is false. For example, consider the principal 
ideal (x) in Z[x]. By Exercise 3.87 on page 303, we have 


Z[x]/(x) = Z; 
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since Z is a domain, (x) is a prime ideal; since Z is not a field, (x) is not a 
maximal ideal. 

It is not difficult to exhibit a proper ideal J strictly containing (x); let 

J = [fix) e Z[x] : f{x) has even constant term}. 

Since Z[x]/7 = F 2 , which is a field, it follows that 7 is a maximal ideal con- 
taining (x). ◄ 

Corollary 7.10. Ifk is afield, then (xi — a \, . . . , x n — a n ) is a maximal ideal 
in k[x 1 , . . . , x„], where a; e kfor i = 1 

Proof. By Proposition 3.33, there is a unique homomorphism 
< p : k[x 1 , . . . , x„] -> k\x\ x„] 

with (pic) = c for all c e k and with (pixj) = x, — a ,• for all i . It is easy to see that 
cp is an isomorphism, for its inverse carries x\ 1 — x,- + a\ for all i. It follows that 
I is a maximal ideal in k[x \ , . . . , x n \ if and only if <p(I) is a maximal ideal. But 
(xi, . . . , x n ) is a maximal ideal, for k[x 1 , . . . , x n ] / (x 1 , . . . , x n ) = k is a field. 
Therefore, (xi — a „, . . . , x n — a„) is a maximal ideal. • 

The converse of Corollary 7.8 is true when R is a PID. 

Theorem 7.11. If R is a PID, then every nonzero prime ideal I is a maximal 
ideal. 

Proof. Assume there is a proper ideal J with / c J . Since R is a PID, I = (a) 
and J = ( b ) for some a, b e R. Now a e J implies that a = rb for some r e R, 
and so rb e /; but I is a prime ideal, so that r e I or b e I . If r e I, then 
r = sa for some s e R, and so a = rb = sab. Since R is a domain, 1 = sb, and 
Exercise 3.22 on page 232 gives J = (b) = R, contradicting J being a proper 
ideal. If b € I, then J c /, and so J = I. Therefore, I is a maximal ideal. • 

We can now give a second proof of Proposition 3.113. 

Corollary 7.12. Ifk is afield and p(x) e k[x | is irreducible, then the quotient 
ring k[x]/ (p(x)) is afield. 

Proof. Since p{x) is irreducible, Proposition 7.4 says that the principal ideal 
I = (p(x)) is a nonzero prime ideal; since k[x \ is a PID, I is a maximal ideal, 
and so k[x\/I is a field. • 

Does every commutative ring R contain a maximal ideal? The (positive) an- 
swer to this question involves Zorn ’s lemma, a theorem equivalent to the Axiom 
of Choice, which is usually discussed in a sequel course (but see Corollary 7.27). 
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Exercises 

7.1 (i) Find all the maximal ideals in Z. 

(ii) Find all the maximal ideals in k[x ], where k is a h eld. 

(iii) Find all the maximal ideals in A[[x]], where k is a fi eld. 

7.2 Recall that a Boolean ring is a commutative ring R for which a 2 = a for all a e R. 
Prove that every prime ideal in a Boolean ring is a maximal ideal. 

*7.3 (i) Give an example of a commutative ring containing two prime ideals P 

and Q for which P fl Q is not a prime ideal. 

(ii) If I'\ D /h D ■ • • P n D P n+ \ 3 • • • is a decreasing sequence of prime 
ideals in a commutative ring R , prove that P|«>i i s a prime ideal. 

7.4 Let / : R — > S' be a ring homomorphism. 

(i) If Q is a prime ideal in S, prove that f~ l (Q) is a prime ideal in R. 
Conclude, in the correspondence theorem, that if J /I is a prime ideal in 
R/I, where / C J C R, then J is a prime ideal in R. 

(ii) Give an example to show that if P is a prime ideal in A’, then f{P) need 
not be a prime ideal in S. 

7.5 Let A be a fi eld, and let a = (ai, ... ,a n ) e k n . Defi ne the evaluation map 
e a : k[x i . . . . , x n | k by 

e a : f(x i ,x„) h* f(a) = f(a\, .... a„). 

(i) Prove that e a is surjective, and conclude that kere a is a maximal ideal in 
k[x i, . . . , x n ]. 

(ii) Prove that (xi — ai, . . . , x„ — a n ) is a maximal ideal in k[x i , . . . , x„] by 
showing that kere a = {x\ — a\, . . . ,x n — a„). (This is a second proof of 
Corollary 7.10.) 

7.6 (i) Find all the maximal ideals in k[x\, where k is an algebraically closed 

fi eld. 

(ii) Find all the maximal ideals in R|.r |. 

(iii) If k is an algebraically closed fi eld, prove that the function 

k -> {maximal ideals in k\x \ } , 

given by a i-A- (x — a), the principal ideal in k[x] generated by x — a, is 
a bijection. 

7.7 (i) Prove that if — b € (xj — a\, ... ,x n — a„) for some i, where k is a 

fi eld and b e k, then b = ct . 

(ii) Prove that p : k n {maximal ideals in k[x i, . . . , x„]|, given by 

p : (ai, . . . , a„) )-+ (x\ — ai, . . . , x n — a„), 

is an injection, and give an example of a fi eld k for which p is not a 
surjection. 

7.8 Prove that if P is a prime ideal in a commutative ring R and if r" e P for some 
r G R and n > 1 , then ref. 

7.9 Prove that the ideal (x 2 — 2, y 2 + 1, z) in Q[x, y, j:] is a proper ideal. 
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7.10 Call a nonempty subset S of a commutative ring R multiplicatively closed if 0 (/ S 
and, if s, s' e S, then ss' € S. Prove that an ideal I which is maximal with the 
property that I fl S = 0 is a prime ideal. (The existence of such an ideal I can be 
proved using Zorn's lemma.) 

*7.11 (i) If I and J are ideals in a commutative ring R, defi ne 

I J = { ^ ciibi : ai G I and b/ e J } . 

l 

Prove that I J is an ideal in R and that I J C I n 7. 

(ii) Let R = k\x, y], where k is a h eld and let I = (x,y) = J . Prove that 
I 2 = IJ C I n 7 = I. 

7.12 Let P be a prime ideal in a commutative ring R. If there are ideals I and J in R 
with IJ C P, prove that I C P or 7 C P. 

7.13 If I and J are ideals in a commutative ring R, defi ne the colon ideal 

(I: J) = {r e R : rj C /}. 

(i) Prove that (7 : J) is an ideal containing I . 

(ii) Let R be a domain and let a, b e R, where b / 0. If 7 = ( ab ) and 
J = ( b ), prove that (7: 7) = (a). 

7.14 Let 7 and 7 be ideals in a commutative ring R. 

(i) Prove that there is an injection R/(I fl 7) — ► R/I x R/J given by 
cp: r i-»- (r + I, r + J) 

(ii) Call I and 7 coprime if 7 + 7 = R. Prove that the ring homomorphism 
< p : R/(I fl 7) — »• R/I x R/J is a surjection if 7 and 7 are coprime. 

(iii) Generalize the Chinese remainder theorem as follows. Let R be a com- 
mutative ring and let 7] .... , 7„ be pairwise coprime ideals; that is, 7, and 
Ij are coprime for all i ^ j . Prove that if a\ , . . . , a„ € R, then there 
exists r e R with r + 7,- = at + Ij for all i . 

7.15 A commutative ring R is called a local ring if it has a unique maximal ideal. 

(i) If p is a prime, prove that 

[a/b e Q: p\ b} 

is a local ring. 

(ii) If R is a local ring with unique maximal ideal M, prove that a G R is a 
unit if and only if a M. 

(iii) If k is a fi eld, prove that £[U]] is a local ring. 


7.2 Unique Factorization 

We have proved unique factorization theorems in Z and in k[x\, where k is a 
field. In fact, we have proved a common generalization of these two results: 
every euclidean ring has unique factorization. Our aim now is to generalize 
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this result, first to general PID’s, and then to R[x], where R is a ring having 
unique factorization. It will then follow that there is unique factorization in the 
ring k[x i, . . . , x n | of all polynomials in several variables over a field k. One 
immediate consequence is that any two polynomials in several variables have a 
gcd. 

We begin by generalizing some earlier definitions. Recall that elements a 
and b in a commutative ring R are associates if there exists a unit u e R with 
b = ua. For example, in Z, the units are ±1, and so the only associates of an 
integer m are ±m; in k[.x], where k is a field, the units are the nonzero constants, 
and so the only associates of a polynomial f(x) e k[x ] are the polynomials 
u fix), where u e k and u f 0. The only units in Z[x] are ±1, and so the only 
associates of a polynomial /(x) e Z[x] are ±/(x). 

Consider two principal ideals (a ) and (b) in a commutative ring R. It is 
easy to see that the following are equivalent: a e (b); a = rb for some r e R: 
{a) C (b). We can say more when R is a domain. 

Proposition 7.13. Let R be a domain and let a , b e R. 

(i) a | b and b \ a if and only if a and b are associates. 

(ii) The principal ideals (a) and ( b ) are equal if and only if a and b are asso- 
ciates. 


Proof. 

(i) This is Proposition 3.15. 

(ii) If (a) = (b), then (a) c (b) and (b) c (a); hence, a e ( b ) and b e (a). 
Thus, a | b and b | a; by part (i), a and b are associates. The converse is easy, 
and one does not need to assume that R is a domain to prove it. • 

The notions of prime number in Z or irreducible polynomial in fc[x], where 
k is a field, have a common generalization. 

Definition. A element p in a commutative ring R is irreducible if it is neither 
0 nor a unit and if its only factors are associates of p or units. 

For example, the irreducibles in Z are the numbers ±p, where p is a prime, 
and the irreducibles in &[x], where k is a field, are the irreducible polynomials 
p{x)\ that is, deg(p) > 1 and p (x ) has no factorization p(x) = f{x)g{x) where 
deg (/) < deg (p) and deg(g) < deg(g). This characterization of irreducible 
polynomial does not persist in rings R[x] when R is not a field. For example, in 
Z[x], the polynomial /(x) = 2x + 2 cannot be factored into two polynomials, 
each having degree smaller than deg (/) = 1, yet /(x) is not irreducible, for in 
the factorization 2x + 2 = 2(x+ 1), neither 2 nor x + 1 is a unit. 
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Definition. If R is a commutative ring, then an element r e R is a product 
of irreducibles if r is neither 0 nor a unit and there exist irreducible p\, , p n , 
where n > I , with r = p\ ■ ■ ■ p n . 

Note the instance of the definition when n = 1 ; every irreducible element in 
R is a product of irreducibles (it is a product with one factor!). 

Here is the definition we have been seeking. 

Definition. A domain R is a unique factorization domain (UFD) if 

(i) every r e R, neither 0 nor a unit, is a product of irreducibles; 

(ii) if p\ - ■ ■ p m = <Zi • • • where p\ and qj are irreducible, then m = n and 
there is a permutation a e S n with p; and q n (i) associates for all i. 

When we proved that Z and k[.x], for k a field, have unique factorization into 
irreducibles, we did not mention associates because, in each case, irreducible 
elements were always replaced by favorite choices of associates: in Z , positive 
irreducibles, i.e., primes, are chosen; in k[x |. monic irreducible polynomials are 
chosen. The reader should see, for example, that the statement: “Z is a UFD” is 
just a restatement of the fundamental theorem of arithmetic. 

The proof that every PID is a UFD uses a new idea: chains of ideals. 

Lemma 7.14. Let R be a PID. 

(i) There is no infinite strictly ascending chain of ideals 

h C h C • • • c /„ C 7„ +1 c • • • . 

(ii) Ifr e R is neither 0 nor a unit, then r is a product of irreducibles. 

Proof. 

(i) If, on the contrary, an infinite strictly ascending chain exists, then define J = 
U ~1 In ■ We claim that J is an ideal. If a e J, then a e /„ for some /?; ifr e R, 
then ra e /„, because I n is an ideal; hence, ra e J . If a. b e J, then there are 
ideals /„ and I m with a € I n and b e /,„; since the chain is ascending, we may 
assume that /„ c /,„, and so a, b e I m . As I„, is an ideal, a —b e I m and, hence, 
a — b e J. Therefore, J is an ideal. 

Since R is a PID, we have J = ( d ) for some d e J. Now d got into J by 
being in I n for some n. Hence 

J = (d) c /„ C I n+l C J, 


and this is a contradiction. 
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(ii) If r is a divisor of an element a e R, then a = rs',r is called a proper divisor 
of a if neither r nor s is a unit. We first show that if r is a proper divisor of a, 
then (a) C (r). By Proposition 7.13, (a) C (r) and, if the inequality is not strict, 
then a and r are associates. In the latter case, there is a unit u e R with a = ur, 
and this contradicts r being a proper divisor of a. 

Call a nonzero nonunit a e R good if it is a product of irreducibles; otherwise, 
call a bad. We must show that there are no bad elements. If a is bad, it is not 
irreducible, and so a = rs, where both r and s are proper divisors. But the 
product of good elements is good, and so at least one of the factors, say r, is bad. 
The previous paragraph shows that (a) C (r). It follows, by induction, that there 
exists a sequence a = a\, r = ai, . . . , a n , . . . of bad elements with each a„+ \ a 
proper divisor of a n , and this sequence yields a strictly ascending chain 


Oi) C (af) C ••• C (a„) C (o„+i) C ... , 


contradicting part (i) of this lemma. • 

Proposition 7.15. Let R be a domain in which every r € R, neither 0 nor 
a unit, is a product of irreducibles. Then R is a UFD if and only if, for every 
irrreducible element p € R, the principal ideal (p) is a prime ideal in R. 

Proof. Assume that R is a UFD. If a, b e R and ab e ( p ), then there is r e R 
with 

ab = rp. 

Factor each of a, b, and r into irreducibles; by unique factorization, the left side 
of the equation must involve an associate of p. This associate arose as a factor 
of a or b, and hence a e ( p) or b e ( p ). 

The proof of the converse is merely an adaptation of the proof of the funda- 
mental theorem of arithmetic. Assume that 

Pi ' ' ' Pm = d l ' "?di 

where the pf s and the qf s are irreducible elements. We prove, by induction 
on max{ m, n } > 1, that n = m and the q’s can be reindexed so that q\ and p, 
are associates for all i. The base step maxjm, n} = 1 has p\ = q\ , and the 
result is obviously true. For the inductive step, the given equation shows that 
pi \ q\ ■ ■ q n . By hypothesis, (p i) is a prime ideal (which is the analog of 
Euclid’s lemma), and so there is some qj with p \ \ qj. But qj, being irreducible, 
has no divisors other than units and associates, so that qj and pi are associates: 
q j = up i for some unit u. Canceling pi from both sides, we have p 2 ■ ■ ■ p m = 
uq i • • ■q'j ■ ■ • q n . By the inductive hypothesis, m — 1 = n — 1 (so that m = n), 
and, after possible reindexing, q\ and p, are associates for all i . • 
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Theorem 7.16. If R is a PID, then R is a UFD. In particular, every 
euclidean ring is a UFD. 

Proof. In view of the last two results, it suffices to prove that (p) is a prime 
ideal whenever p is irreducible. Suppose that p \ ah: we must show that p \ b 
or p | a. The subset 

I = { sb +tp: s,t e R} 

is an ideal in R and, hence, I = (d) because R is a PID. Now b. p e I, so that 
d | p and d \ b. Since p is irreducible, either d is an associate of p or d is a 
unit. In the first case, d = up for some unit u, and so d \ b implies p \ b. In 
the second case, d is a unit. Now d = sb + tp, and so da = sab + tap. Since 
p | ab, we have p \ da. But da is an associate of a, and so p \ a. It follows that 
( p) is a prime ideal. • 

Recall that the notion of gcd can be defined in any commutative ring. 

Definition. Let R be a commutative ring and let a \, . . . , a n e R. A common 
divisor of a\, .... a H is an element c e R with c \ a ,• for all i. A greatest 
common divisor ox gcd of a\, .... a„ is a common divisor d with c \ d for every 
common divisor c. 

Even in the familiar examples of Z and k[x], gcd’s are not unique unless 
an extra condition is imposed. For example, if d is a gcd of a pair of integers 
in Z, as defined above, then — d is also a gcd. To force gcd’s to be unique, one 
defines nonzero gcd’s in Z to be positive; similarly, in k[x], where k is a field, one 
imposes the condition that nonzero gcd’s are monic polynomials. In a general 
PID, however, elements may not have favorite associates. 

If R is a domain, then it is easy to see that if d and d' are gcd’s of elements 
a, i, then d \ d' and d' \ d. It follows from Proposition 7.13 that d and d' 
are associates and, hence, that ( d ) = ( d ' ). Thus, gcd’s are not unique, but they 
all generate the same principal ideal. 

In Exercise 3.75 on page 272, we saw that there exist domains R containing 
a pair of elements having no gcd. However, the idea in Proposition 1.52 carries 
over to show that gcd’s do exist in UFD’s. 

Proposition 7.17. If R is a UFD, then the gcd of any finite set of elements 
a i , . . . , a n exists. 

Proof. It suffices to prove that the gcd of two elements a and b exists, for an 
easy inductive proof then shows that a gcd of any finite number of elements 
exists. 

There are units u and v and distinct irreducibles pi, ..., p t with 

e\ e*> e, 

a = upfpf ■ ■ ■ p t ' 
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and 

b = vpj' p -j ■ ■ ■ p{‘ , 

where e t > 0 and /,• > 0 for all i. It is easy to see that if c \ a, then the 
factorization of c into irreducibles is c = wpf 1 p| 2 • • • pf 1 , where u; is a unit and 
gi < e, for all i. Thus, c is a common divisor of a and b if and only if g / < m\ 
for all i, where 

nij = min{e,-, /,■}. 

It is now clear that p™ 1 p™ 2 • • • p™‘ is a gcd of a and b. • 

We have not proved that a gcd of elements a \ , . . . , a n is a linear combination 
of them; indeed, this may not be true (see Exercise 7.21 on page 532). 

Definition. Elements a\, . ... a, , in a UFD ft are called relatively prime if all 
their gcd’s are units - that is, if every common divisor of a \ , . . . , a n is a unit. 

We are now going to prove that if R is a UFD, then so is ft [a]. This theorem 
was essentially found by Gauss, and the proof uses ideas in the proof of Gauss’s 
theorem, Theorem 3.97. It will follow that k[x i , . . . , x„] is a UFD whenever k is 
a field. 

Definition. A polynomial f(x) = a n x" + • • • + a \ x + «o e /?[*], where R is 
a UFD, is called primitive if its coefficients are relatively prime; that is, the only 
common divisors of a n , . . . , a\, ao are units. 

Observe that if f(x) is not primitive, then there exists an irreducible q e R 
that divides each of its coefficients: if the gcd is a nonunit d, then take for q any 
irreducible factor of d . 

Example 7.18. 

We now show, in a UFD R, that every irreducible p(x) e R\x \ of positive degree 
is primitive. If not, then there is an irreducible q e R with p (x ) = qg(x). Since 
p(x) is irreducible, its only factors are units in ft [ a | and associates in ft [ a ] . Now 
q is irreducible in ft; can it be a unit in ft [a ]? Every unit u e ft[.v] has degree 0, 
i.e., it is a constant, for uv = 1 implies deg(w) + deg(n) = deg(l) = 0. It 
follows that q is not a unit in ft[x], for if qv = 1 in ft [a], then dcg(u) = 0 and v 
is a unit in ft, contradicting the irreducibility of q in ft. Therefore, q must be an 
associate of p(x). But associates in ft[.v] have the same degree (because units 
have degree 0). Therefore, q is not an associate of p(x), because the latter has 
positive degree. We conclude that p(x) is primitive. < 

We begin with a generalization of Gauss’s lemma (Femma 3.93). 
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Lemma 7.19. If R is a UFD and fix), g(x ) £ R[x ] are both primitive, then 
their product f(x)g(x) is also primitive. 

Proof. If it : R — > R/ip ) is the natural map Jt : a m* a + (/?), then Propo- 
sition 3.33 shows that the function n: R[x ] — > (R/( p)) [x], which replaces 
each coefficient c of a polynomial by 7t(c), is a ring homomorphism. Now the 
hypothesis that a polynomial h ( x ) £ R[x ] is not primitive says there is some irre- 
ducible p such that all the coefficients of n(h) are 0 in R/(p)\ that is, n(fi) = 0 
in ( R/(p )) [x]. Thus, if the product f(x)g(x) is not primitive, there is some 
irreducible p with 0 = n (fg ) = n(f)n(g) in ( R/ip )) [x]. Since ( p ) is a prime 
ideal, R/ip) is a domain, and hence ( R/(p )) [x ] is also a domain. But, nei- 
ther n (/) nor jt ig ) is 0 in ( R/(p )) [x], because / and g are primitive, and this 
contradicts (R/(p)) [x] being a domain. • 

Definition. If R is a UFD and fix) = a n x n + • • • + a\x + no £ /?[*], define 
c(f) £ R to be a gcd of a n , . . . , a\, ao', one calls c(f) the content of f(x). 

Note that the content of a polynomial f(x) is not unique, but that any two 
contents of / (x) are associates. 

It is obvious that if b e R and b \ f(x) e f?[x], where R is a UFD, then b is 
a common divisor of the coefficients of f(x), and so b \ c(f). 

Lemma 7.20. Let R be a UFD. 

(i) Every nonzero f(x ) £ R [x ] has a factorization 

fix) = c(f)f*(x), 

where c( f) £ R and f*(x) £ R[x | is primitive. 

(ii) This factorization is unique in the sense that if f(x) = dg*(x), where 
d £ R and g*(x) £ R\x\ is primitive, then d and c{f) are associates and 
f*(x) and g*(x) are associates. 

(iii) Let g*(x), f (x) £ /?[*]. If g*(x) is primitive and g*(x) \ bf(x), where 
b £ R, then g*(v) | fix). 

Proof. 

(i) If f{x) = a n x n + • • • + a\x + no and c(f) is the content of /, then there 
are factorizations a, = c(f)bj in R for / = 0, 1, . . . , n ; if we define f*(x) = 
b n x n + ■ — f b ix + bo, then it is easy to see that f*(x) is primitive and f(x) = 
c(f)f*(x). 

(ii) To prove uniqueness, suppose that fix) = dg*ix) is a second such factor- 
ization, as in the statement. Now c(/)/*(v) = fix) = dg*ix), so that, in 
Q[x ], where Q = Frac(7?), we have f*ix) = [d /cif)]g*ix). Exercise 7.17 on 
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page 532 allows us to write d/c(f) in lowest terms: d /c(f ) = u/v, where u 
and v are relatively prime elements of R. The equation vf*(x ) = ug*(x) holds 
in A“[x]; equating like coefficients, v is a common divisor of each coefficient of 
ug*(x). Since u and v are relatively prime, Exercise 7.18 on page 532 gives v 
a common divisor of the coefficients of g*(x). But g*(x) is primitive, and so v 
is a unit. A similar argument shows that a is a unit. Therefore, d/c{f) = u/v 
is a unit in R, call it w, and d = wc(f); that is, d and c(f ) are asssociates and, 
hence, g*(x) = f*(x ) are associates. 

(iii) Since g*(x) \ bf(x), there is h(x) e f?[x] with bf(x) = g*(x)h(x). By 
part (i), we have 


h(x) = c(h)h*(x) and /(x) = c(/)/*(x), 
where h* and f* are primitive. Therefore, 

bc{f)f*{x) = c(h)g*(x)h*(x). 

Now g*(x)h*(x) is primitive, by Lemma 7.19, and so the uniqueness in part (ii) 
gives a unit u e R with c(h) = ubc(f). Therefore, 

bf(x) = g*(x)c(h)h*(x) = g* (x)[ubc(f)h* (x )]. 

Canceling b gives / (x) = g*(x)h'(x), where h'(x) = uc(f)h*(x ) e 7?[x]; that 
is, g*(x) | f(x). • 

Theorem 7.21 (Gauss). If R is a UFD, then R\x | is also a UFD. 

Proof. We show first, by induction on deg (/), that every f(x) e R\x |. neither 
zero nor a unit, is a product of irreducibles. If deg (/) = 0, then / (x ) is a 
constant, hence lies in R. Since R is a UFD, / is a product of irreducibles. If 
deg (/) > 0, then f(x) = c(f)f*(x), where c(f) e R and f*(x) is primitive. 
Now c(f) is either a unit or a product of irreducibles, by the base step. If f*(x) 
is irreducible, we are done. Otherwise, f*(x) = g(x)h(x), where neither g nor 
h is a unit. Since f*(x) is primitive, however, neither g nor h is a constant; 
therefore, each of these has degree less than deg (/*) = deg(/), and so each is a 
product of irreducibles, by the inductive hypothesis. 

Proposition 7.15 now applies: 7?[x] is a UFD if ( p(x )) is a prime ideal for 
every irreducible p (x ) e A'[a |: that is, if p(x) \ f (x)g(x), then p (x ) | f(x) or 
p(x) | g(x). Let us assume that p(x) \ fix). We may abbreviate f(x) to / in 
this proof. 

Case (i). Suppose that deg(/^) = 0. Write 


fix) = cif)f*ix) and g(x) = c(g)g*(x), 
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where c(f), c(g ) e R, and f*(x), g*(x) are primitive. Now p \ fg, so that p | 
c(f)c(g)f*(x)g*(x). Since f*(x)g*(x ) is primitive, we must have c(f)c(g ) an 
associate of c{fg), by Lemma 7.20(h). However, if p \ f (x)g(x). then p divides 
each coefficient of fg; that is, p is a common divisor of all the coefficients of fg, 
and hence p \ c(fg) = c(f)c(g) in R, which is a UFD. But Proposition 7.15 
says that ip) is a prime ideal in R, and so p \ c(f ) or p \ c(g). If p \ c{f), 
then pix) | c(/)/*(x ) = fix), a contradiction. Therefore, p \ c(g ) and, hence, 
p{x) | g(x), as desired. 

Case (ii). Suppose that deg(p) > 0. Let 

( P , f) = [sp + tf:s,te 7?[x]}; 

of course, ip, f) is an ideal containing pix) and fix). Choose m(x) e ip, f) 
of minimal degree. If Q = Frac( R) is the fraction field of R, then the divi- 
sion algorithm in Q[x ] gives polynomials q'(x), r'ix) e Q[x | with fix) = 
mix)q'(x ) + r'ix), where either r'ix) = 0 or deg(r') < deg(m). Clearing de- 
nominators, there are polynomials qix), rix) e R [ a | and a constant b e R with 

bf(x ) = qix)m(x ) + rix), 

where r(x) = 0 or deg(r) < deg(m). Since m e ip, /), there are polynomials 
six), tix) e f?[x] with mix) = sfx)pix) + t (x)/(x); hence 

r = bf — qm = bf - qisp + tf) 

= (b~ tq)f ~ spq e ip, f). 

Since m has minimal degree in ip, f), we must have r = 0; that is, bfix) = 
mix)qix), and so bfix) = c(m)m*ix)qix). But m*(x) is primitive, and m*(x) | 
bfix), so that m*ix) \ fix), by Lemma 7.20(iii). A similar argument, replacing 
fix) by p(x), gives m*(x) | p(x). Since p ix ) is irreducible, its only factors are 
units and associates. If m*(x) were an associate of p(x), then the equation 

mix) = cim)m* ix) = six)pix) + tix) fix) 

would give pix) \ fix), contrary to the hypothesis. Hence, m*(x) must be a 
unit; that is, mix) = c{m) e R, and so ip, f) contains the nonzero constant 
dm). Now dm) = sp + tf, and so 

dm)g = spg + tfg. 

Since pix) \ f(x)gix), we have p \ dm)g. But pix) is primitive, because it is 
irreducible, and so pix) \ gix), by Lemma 7.20(iii). This completes the proof. 


It follows from Proposition 7.17 that if R is a UFD, then gcd’s exist in /? [x ] . 
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Corollary 7.22. Ifk is afield, then k\x \ , . . . , x„] is a UFD. 

Proof. The proof is by induction on n > 1. We proved, in Chapter 3, that the 
polynomial ring k[x\ ] in one variable is a UFD. For the inductive step, recall that 
k[x i , . . . , x„, x„+i ] = R\x„+ 1 ], where R = k[x i, . . . , x n ]. By induction, R is a 
UFD, and by Theorem 7.21, so is F[x„ + j]. • 

The theorem of Gauss, Theorem 3.97, can be generalized. 

Corollary 7.23. Let R be a UFD, let Q = Frac (R), and let fix) e K[x |. If 

f(x) = G(x)H(x) in Q[x], 
then there is a factorization 

f{x) = g{x)h(x) in F[x], 

where deg(g) = deg(G) and deg (/;) = deg {H)\ in fact, g(x) and G(x) are 
associates in Q[x ], as are h(x) and H (x). 

Therefore, if fix) does not factor into polynomials of smaller degree in R[x |, 
then fix) is irreducible in Q\x \. 

Proof By Lemma 7.20, there is a factorization 

fix) = dG)ciH)G*ix)H*ix) in Q[x], 

where G*ix), H*ix) e R[x ] are primitive polynomials. But c(G)c( H) = c(/), 
by Lemma 7.20(h). Since c(/) e R, there is a factorization fix) = gix)hix) in 
/?[*], where g(x) = c(/)G*(x) and hix) = H*ix). • 

Irreducibility of a polynomial in several variables is more difficult to deter- 
mine than irreducibility of a polynomial of one variable, but here is one criterion. 

Corollary 7.24. Let k be afield and let fix i, . . . , x n ) be a primitive polyno- 
mial in R\x n \ where R = k[x\, . . . , x n -\]. If f cannot be factored into two 
polynomials of lower degree in R\x„ |, then f is irreducible in k[x\, . . . , x n \ 

Proof Let us write fix i, . . . , x n ) = Fix n ), for we wish to view / as a poly- 
nomial in that is, we view / as a polynomial in x n having coefficients 

in k[x \ , . . . , x n —i]. Suppose that Fix,,) = Gix„)H (x M ); by hypothesis, the 
degrees of G and H (in x n ) cannot both be less than deg(F), and so one of 
them, say, G, has degree 0. It follows, because F is primitive, that G is a 
unit in k[x i, . . . , x n -\]- Therefore, fix i, . . . , x„) is irreducible in R[x,, \ = 
k[xi,...,x n l • 

Of course, the corollary applies to any variable X{, not just to x n . 
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Example 7.25. 

We claim that /(x, y) = x 2 + y 2 — 1 e k[x, y] is irreducible, where k is a field of 
characteristic not 2. Write Q = k(y) = Frac(fc[y]), and view /(x, y) e Q[x]. 
Now the quadratic g (x ) = x 2 + (y 2 — 1) is irreducible in Q[x] if and only if it 
has no roots in Q = k(y), and this is so, by Exercise 3.62 on page 271. 

Since k[x, y] is a UFD, it follows from Proposition 7.15 that (x 2 + y 2 — 1) 
is a prime ideal because it is generated by an irreducible polynomial. <4 


Exercises 

7.16 In any commutative ring R, prove that if a gcd of any two elements always exists, 
then a gcd of any fi nite number of elements also exists. 

*7.17 Let R be a UFD and let Q = Frac(R) be its fraction fi eld. Prove that each nonzero 
a/b € Q has an expression in lowest terms; that is, a and b are relatively prime. 

*7.18 Let R be a UFD, and let a, b, c e R. If a and b are relatively prime, and if a \ be, 
prove that a \ c. 

7.19 If R is a domain, prove that the only units in R[xi , . . . , x„] are units in R. 

7.20 If R is a UFD and /(x), g(x) e 7?[x], prove that c(fg) and c(f)c(g) are asso- 
ciates. 

*7.21 (i) Prove that x and y are relatively prime in k[x, y], where k is a fi eld. 

(ii) Prove that 1 is not a linear combination of x and y in k[x, y]. 

7.22 Prove that Z[xi, . . . , x„] is a UFD for all n > 1 . 

7.23 Let £ be a fi eld and let fix . . . , x„) e k[x i, . . . , x„] be a primitive polyno- 

mial in R[x„], where R = k[x\, . . . , x„^i]. If / is either quadratic or cubic in 
x„, prove that / is irreducible in k[x i, . . . , x„] if and only if / has no roots in 
k(x i ,x„_i). 

7.24 Let /(xj, . . . ,x„) = x„g(xi x„_i) + h(x i, . . . ,x„_i), where (g, h) = 1. 

(i) Prove that / is irreducible in k[x ,x n \- 

(ii) Prove that xy 2 + z is an in'educible polynomial in ^[x, y, z], 

7.25 ( Eisenstein’s criterion) Let R be a UFD with Q = Frac(R), and let /(x) = 
no + flix + • • • + a„x" G R[xl Prove that if there is an irreducible element p e R 
with p | a,- for all i < n but with p j a„ and p 2 \ ciq, then /(x) is irreducible in 

GW. 

7.26 Prove that 

f(x, y) = xy 3 -I- x 2 y 2 — x 5 y + x 2 + 1 
is an irreducible polynomial in M[x, y]. 


7.3 Noetherian Rings 

One of the most important properties of k[x \ , . . . , x„\, when k is a held, is that 
every ideal in it can be generated by a finite number of elements. This property 
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is intimately related to chains of ideals, which we have already met in the course 
of proving that PID’s are UFD’s (I apologize for so many acronyms). 

Definition. A commutative ring R satisfies the ACC, the ascending chain 
condition, if every ascending chain of ideals 


7t c / 2 c • • • c /„ c • • • 

stops', that is, the sequence is constant from some point on: there is an integer N 
with I N = I N+ 1 = I N+ 2 = • • • . 

The proof of Lemma 7.14 shows that every PID satisfies the ACC. 

Here is an important type of ideal. 

Definition. An ideal / in a commutative ring R is called finitely generated if 
there are finitely many elements a\, . . . , a n e I with 

I = : rj e R for all i } ; 

i 

that is, every element in / is a linear combination of the a,’ s. One writes 

I = (a\, ..., a n ) 

and calls I the ideal generated by a\, ... ,a n . A set of generators a i . 
of an ideal I is sometimes called a basis of I (although this is a weaker notion 
than that of a basis of a vector space because uniqueness of expression is not 
assumed). 

Every ideal I in a PID can be generated by one element, and so 1 is finitely 
generated. 

Proposition 7.26. The following conditions are equivalent for a commutative 
ring R. 

(i) R has the ACC. 

(ii) R satisfies the maximum condition: every nonempty family IF of ideals in 
R has a maximal element', that is, there is some /o £ T for which there is 
no J e J 7 with Iq C J. 

(iii) Every ideal in R is finitely generated. 

Proof (i) => (ii): Let J 7 be a family of ideals in R, and assume that J 7 has no 
maximal element. Choose / 1 e J 7 . Since / 1 is not a maximal element, there is 
I 2 € J 7 with / 1 C l 2 . Now h is not a maximal element in IF, and so there is 
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73 £ F with h C 73 . Continuing in this way, we can construct an ascending 
chain of ideals in R that does not stop, contradicting the ACC. 

(ii) =>• (iii): Let 7 be an ideal in R, and define F to be the family of all the 
finitely generated ideals contained in 7; of course, F 7 ^ 0 . By hypothesis, there 
exists a maximal element M e F. Now M c 7 because M £ F. If M C 7, then 
there is a £ I with a f M. The ideal 

7 = [m + ra : m e M and r £ R} c / 

is finitely generated, and so 7 £ T 7 ; but M C J, and this contradicts the maxi- 
mality of M. Therefore, M = 7, and so 7 is finitely generated. 

(iii) =>. (i): Assume that every ideal in R is finitely generated, and let 

7l c I 2 c • • • C I n c • • • 

be an ascending chain of ideals in 7?. As in the proof of Lemma 7.14, we show 
that 7 = U„ hi is an ideal. If a £ J, then a £ 7„ for some n ; if r £ R, then 
ra £ /„, because 7„ is an ideal; hence, ra e J . If a, b e J , then there are ideals 
7„ and I m with a £ 7„ and /; £ I m ; since the chain is ascending, we may assume 
that 7„ C 7 m , and so a, b £ 7„, . As I m is an ideal, a — b £ l m and, hence, 
a — b £ 7. Therefore, 7 is an ideal. 

By hypothesis, there are elements a ,■ £ 7 with 7 = (oi, . . . , a q ). Now a,- 
got into 7 by being in I nj for some n ; . If N is the largest n , , then I nj C / iV for 
all i ; hence, a, £ /,v for all i, and so 

7 — (n 1 , . . . , ^ 7^ c 7 . 

It follows that if « > N, then 7 = 7^ c 7„ c 7, so that /„ = 7; therefore, the 
chain stops, and R has the ACC. • 

We now give a name to a commutative ring which satisfies any of the three 
equivalent conditions in the proposition. 

Definition. A commutative ring R is called noetherian 1 if every ideal in R is 
finitely generated. 

Corollary 7.27. If 7 is an ideal in a nonzero noetherian ring R, then there 
exists a maximal ideal M in R containing 7. In particular, every noetherian ring 
has maximal ideals. 2 

'This name honors Emmy Noether (1882-1935), who introduced chain conditions in 1921. 
“This corollary is true without assuming R is noetherian, but the proof of the general result 
needs Zorn’s lemma. 
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Proof. Let JFbe the family of all those proper ideals in R which contain 7; note 
that T 0 because 7 e T. Since R is noetherian, the maximum condition gives 
a maximal element M in J- . We must still show that M is a maximal ideal in R 
(that is, that M is actually a maximal element in the larger family T' consisting 
of all the proper ideals in 7?). Suppose there is a proper ideal J with M C J . 
Then 7 C J, and so J e 'F: therefore, maximality of M gives M = J , and so M 
is a maximal ideal in R. • 

Remark. Zorn’s lemma is related to the maximum condition, statement (ii) in 
Proposition 7.26. 

Definition. A partially ordered set is a nonempty set X equipped with a relation 
x < y such that, for all x, y, z e X, we have 

(i) reflexivity : x ;< x; 

(ii) antisymmetry, if x < y and y <x, then x = y; 

(iii) transitivity: if x < y and y < z, then x < z. 

An element u in a partially ordered set X is called a maximal element if 
there is no x e X with u < x and «/x. 

If A is a set, then the family V(A) of all the subsets of A is a partially 
ordered set if one defines U < V to mean U c V, where U and V are subsets of 
A; the family V{A)*, consisting of all the proper subsets of A, is also a partially 
ordered set (more generally, every nonempty subset of a partially ordered set 
is itself a partially ordered set). Another example is the real numbers M, with 
x < y meaning x < y. There are some partially ordered sets, e.g., 'P(A)*, 
having many maximal elements (the complement of a point in A is a maximal 
element in T(A)*), and there are some partially ordered sets, e.g., M, having no 
maximal elements. Zorn’s lemma is a condition on a partially ordered set which 
guarantees that it has at least one maximal element. 

A partially ordered set X is called a chain if, for every a . b e X, either 
a < b ox b < a. (Since every two elements in a chain are comparable, chains 
are sometimes called totally ordered sets to contrast them with more general 
partially ordered sets.) We can now state Zorn’s lemma. 

Zorn's Lemma. Let X be a partially ordered set in which every chain C has an 
upper bound ; that is, there exists xq £ X with c < xq for every c € C. Then X 
has a maximal element. 

It turns out that Zorn’s lemma is equivalent to the Axiom of Choice, which 
says that the cartesian product of nonempty sets is itself nonempty. 
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There is usually no need for Zorn’s lemma when dealing with noetherian 
rings, for the maximum condition guarantees the existence of a maximal element 
in any nonempty family F of ideals. ◄ 

Here is one way to construct a new noetherian ring from an old one. 


Corollary 7.28. If R is a noetherian ring and J is an ideal in R, then R/ J is 
also noetherian. 

Proof. If A is an ideal in R/I, then the correspondence theorem provides an 
ideal J in R with J /I = A. Since R is noetherian, the ideal J is finitely gener- 
ated, say, J = (h\ and so A = ///is also finitely generated (by the 

cosets b\ + I, ... ,b n + I). Therefore, R/I is noetherian. • 

In 1 890, Hilbert proved the famous Hilbert basis theorem, showing that ev- 
ery ideal in C[xi, . . . , x„ ] is finitely generated. As we shall see, the proof is 
nonconstructive in the sense that it does not give an explicit set of generators of 
an ideal. It is reported that when P. Gordan, one of the leading algebraists of 
the time, first saw Hilbert’s proof, he said, “Das ist nicht Mathematik. Das ist 
Theologie!” (“This is not mathematics. This is theology!”). On the other hand, 
Gordan said, in 1899 when he published a simplified proof of Hilbert’s theorem, 
“I have convinced myself that theology also has its merits.” 

The following elegant proof of Hilbert’s theorem is due to H. Sarges. 


Lemma 7.29. A commutative ring R is noetherian if and only if for every 
sequence a i , . . . , a n , . . . of elements in R, there exists m > 1 and r i , . . . , r m e R 
with a m+ 1 = r\a\ H Yr m a m . 

Proof. Assume that R is noetherian and that a\, . . . , a n , . . . is a sequence of 
elements in R. If /„ = (a \ , . . . , a n ). then there is an ascending chain of ideals, 
/| C / 2 C ■ ■ • . By the ACC, there exists m > 2 with /,„ = 1 m +\ . Therefore, 

a m+ 1 e I ,„+ 1 = Inn and so there are r\ G R with a m+ 1 = r\a\ H h r m a m . 

Conversely, suppose that R satisfies the condition on sequences of elements. 
If R is not noetherian, then there is an ascending chain of ideals I\ C / 2 C • • • 
which does not stop. Deleting any repetitions if necessary, we may assume that 
/„ C I n+ 1 for all n. For each n, choose a n+ \ e I n+ \ with a n+ \ £ /„ . By 
hypothesis, there exists m and r, e R for i < m with a m+ \ = ^2 i<m rjOj e I m . 
This contradiction implies that R is noetherian. • 


Theorem 7.30 (Hilbert Basis Theorem). If R is a commutative noetherian 
ring, then R[x | is also noetherian. 
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Proof. Assume that 7 is an ideal in R [x ] that is not finitely generated; of course, 
I f {0}. Define fo(x) to be a polynomial in 7 of minimal degree and define, 
inductively, f n +\(x) to be a polynomial of minimal degree in 7 — (/o, . . . , /„). 
Note that f n (x) exists for all n > 0; if 7 — (/o, were empty, then 7 would 

be finitely generated. It is clear that 

deg(/o) < deg(/i) < deg(/ 2 ) < • • • . 

Let a n denote the leading coefficient of f„ (x ) . Since R is noetherian, Lemma 7.29 
applies to give an integer m with a, n+ \ e (ao , . . . , a m ); that is, there are r, e R 
with a m _|_i = roflo + • • • + r m a m . Define 

m 

fix) = f m+ l(x) - J2 xdm+l ~ dir ifi( x> >’ 
i= 0 

where dj = deg (/,■). Now f*(x ) e 7-(/ 0 (x), . . . , f m (x)), otherwise f m+ i(x) e 
(/o(^)> • • • - fm( x )). H suffices to show that deg (/*) < deg(/„ J+ i), for this con- 
tradicts f m+ i(x) having minimal degree among polynomials in 7 that are not in 
(/o, . . . , f m )• If fi(x ) = ciix d ‘ + lower terms, then 

m 

f*(x) = fm + tW - J2 xdm+i ~ dlr ‘f<W 

7=0 

m 

= (a m + \x d ' n+l + lower terms) — x dm+1 ~ di rj ( ajX di + lower terms). 

i '=0 

The leading term being subtracted is thus f2T=() >'i a ',x dm 11 = a, ll+ 1 x d,n 1 1 . • 

Corollary 7.31. 

(i) Ifk is afield, then k\x\ , , x n | is noetherian. 

(ii) The ring Z[x\, . . . , x n \ is noetherian. 

(iii) For any ideal I in k[x \ , . . . , x n ], where k = 7L or k is afield, the quotient 
ring k[x i, . . . , x„]/I is noetherian. 

Proof. The proofs of the first two items are by induction on n > 1 , using the 
theorem, while the proof of item (iii) follows from Corollary 7.28. • 


Exercises 


7.27 Let m be a positive integer, and let X be the set of all its (positive) divisors. Prove 
that A is a partially ordered set if one defi nes a <b to mean a \ b. 
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7.28 Prove that the ring of Example 3.1 1 on page 222 is not a noetherian ring. 

7.29 Prove that if R is a noetherian ring, then the ring of formal power series R[[.v]] is 
also a noetherian ring. 

7.30 Let 

S 2 = {(a, b , c) G M 3 : a 1 + b 2 + c 2 = 1} 
be the unit sphere in R 3 , and let 

I = lf(x, y, z ) e R|x, y, z ] : f(a, b, c) = 0 for all (a, b, c) e S 2 }. 

Prove that I is a fi nitely generated ideal in R[.r, y, z]. 

7.31 If R and S are noetherian rings, prove that their direct product R x .S’ is also a 
noetherian ring. 

7.32 If R is a ring that is also a vector space over a h eld k, then R is called a 
k-algebra if 

(au)v = a(uv) = it(av) 

for all a e k and u,v € R. Prove that every finite-dimensional fc-algebra is a 
noetherian ring. 


7.4 Varieties 

Analytic geometry gives pictures of equations. For example, we picture a func- 
tion / : R — > R as its graph, which consists of all the ordered pairs (a, f (a)) in 
the plane; that is, / is the set of all the solutions (a. b) e R 2 of 

g(x, y) = y - fix ) = 0. 

We can also picture equations that are not graphs of functions. For example, the 
set of all the zeros of the polynomial 

h(x, y) = x 2 + y 2 - 1 

is the unit circle. One can also picture simultaneous solutions in M 2 of several 
polynomials of two variables, and, indeed, one can picture simultaneous solu- 
tions in W of several polynomials of n variables. 

Notation. Let k be a field and let k" denote the set of all /(-tuples 

k n = {a = (ai, , a n ) : a,- e k for all /}. 

The polynomial ring k[x\ , ... ,x n ] in several variables may be denoted by k\X\. 
where X is the abbreviation: 


X = (xi ,x„). 

In particular, f(X) e k\ X \ may abbreviate fix i, . . . , x n ) e k[x \ , . . . , x„]. 

In what follows, we regard polynomials f(x \, . . . , x n ) e k[x i, . . . , x n ] as 
functions of n variables k" -x k. Here is the precise definition. 
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Definition. A polynomial f(X) e k\X\ determines a polynomial function 

f :k n -*k in the obvious way: if f(x u ,..,x n ) = J2e u ...,e n be i e » x T ' ' ' x »" 

and (a \ , . . . , a n ) e k n , then 

f b : (fli, . . . , a n ) i-> f(a\, . . .,a n ) = ^ b eu _^ en a\ l ■ ■ ■ a e f . 

The next proposition generalizes Corollary 3.52 from one variable to several 
variables. 

Proposition 7.32. Let k be an infinite field and let k\ X \ = k[x i, . . . , x n \ If 
f(X ), g(X) e k[X] satisfy f b = g°, then f(x\, . ..,x n ) = g(x i, . . . , x n ). 

Proof. The proof is by induction on n > 1; the base step is Corollary 3.52. For 
the inductive step, write 

fiX, y) = PiW and g(Z, y) = ^ qfX)^, 
i i 


where X denotes (x\, . . . , x n ) . If f ’ = g°, then we have f(a, a) = g(a , a ) for 
every a e k" and every a e k. For fixed a e k n , define F u (y) = pj(a)y l 
and Gfl(y) = qj(a)y‘ . Since both F a (y) and G a (y) are in ^[y], the base step 
gives pi (a ) = qi(a) for all a e k n . By the inductive hypothesis, pfiX) = q t (X) 
for all i , and hence 


fix, y) = 

i i 


as desired. • 

As a consequence of this last proposition, we drop the f b notation and iden- 
tify polynomials with their polynomial functions when k is infinite. We note that 
algebraically closed fields are always infinite (for any prime power q and any 
multiple rq, there is a field extension W rq f¥ q , and if a e F qr does not lie in , 
then irr(a, F (([ ) is an irreducible polynomial in F^ [x | of positive degree); thus. 
Proposition 7.32 applies whenever k is algebraically closed. 

Definition. If f(X) e k[X ] = k[x i, . . . , x n \ and f(a) = 0, where a e k n , 
then a is called a zero of f(X). [If / (x) is a polynomial in one variable, then a 
zero of f(x) is also called a root of f(x ).] 

Proposition 7.33. Ifk is an algebraically closed field and f(X) e k\ X \ is not 
a constant, then f(X) has a zero. 
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Proof. We prove the result by induction on n > 1, where X = {x\, ... ,x n ). 
The base step follows at once from our assuming that k l = k is algebraically 
closed. As in the proof of Proposition 7.32, write 

f(X,y)=J2gi(X)y i - 

i 

For each a e k” , define f a (y) = Xu 8i( a )y l - If f(X, y ) has no zeros, then each 
fa (v) e k\y] has no zeros, and the base step says that f a (y ) is a nonzero constant 
for all a e k" . Thus, g, (a) = 0 for all i > 0 and all a e k" . By Proposition 7.32, 
which applies because algebraically closed fields are infinite, gfX) = 0 for all 
i > 0, and so f(X, y) = go(^0y° = go(2Q- By the inductive hypothesis, go(^0 
is a nonzero constant, and the proof is complete. • 

We now give some general definitions describing solution sets of polynomi- 
als. 

Definition. If F c k\ X \ = k[x i , . . . , x n \, then the algebraic set defined by F 
is 

Var(F) = [a e k n : f(a) = 0 for every f(X) e F}\ 
thus, Var(F) 3 consists of all those a e k" which are zeros of every f(X) e F. 

Example 7.34. 

(i) Here is an algebraic set defined by two equations. 

Var(x, y) = {(a, b) e k 2 : x = 0 and y = 0}. 


Thus, 

Var(x, y) = x-axis U y-axis. 

More generally, any finite union of algebraic sets is an algebraic set. 

(ii) The n-sphere S" is defined as 

n + 1 

S” = {(xi, . . . , x n +i) e k n+l : ^ x? = 1 } . 

i= t 

More generally, define a hypersurface in k" to be the algebraic set defined 
as all the zeros of a single polynomial in k[X], 

3 The notation Var(F) arises from variety, which is a special kind of algebraic set to be 
defi ned later. 
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(iii) Let A be an m x n matrix with entries in k. A system of m equations in n 
unknowns, 

AX = B, 

where B is an n x 1 column matrix, defines an algebraic set, Var(AX = B), 
which is a subset of k n . Of course, AX = B is really shorthand for a set of 
m linear equations in n variables, and Var(/\ A = B) is usually called the 
solution set of the system AX = B\ when this system is homogeneous, 
that is, when B = 0, then Var {AX = 0) is a subspace of k" , called the 
solution space of the system. M 

The next result shows that, as far as algebraic sets are concerned, one may 
just as well assume the subsets F of k[X] are ideals of k[X], 

Proposition 7.35. 

(i) If F C G C k[X], then Var (G) C Var(F). 

(ii) If F C k\ X | and I = (F) is the ideal generated by F, then 

Var (F) = Var(7). 

Hence, every algebraic set can be defined by a finite number of equations. 
Proof 

(i) If a e Var(G), then g(a) = 0 for all g(X) e G; since F C G, it follows, in 
particular, that f(a) = 0 for all f(X) e F. 

(ii) Since F C ( F) = 7, we have Var(7) C Var( F). by part (i). For the reverse 
inclusion, let a e Var (F), so that f{a) = 0 for every f(X) e F. If g(X) e 
I, then g(X) = ffi r i fi(X), where r ; - e k and /,■ (X) e F\ hence, g(a) = 
J2i r ifi ( a ) = 0 and a € Var(7). 

If 7 is an ideal in k\ X |. then the Hilbert basis theorem says that 7 is finitely 
generated; that is, there is a finite subset F c 7 with Var(7) = VarCT^). • 

It follows that not every subset of k n is an algebraic set. For example, if 
n = 1, then k[x \ is a PID. Hence, if F is a subset of k[x \, then ( F ) = (g(x)) for 
some g(x) e k[x\, and so 

Var(F) = Var((F)) = Var((g)) = Var(g). 

But g(x) has only a finite number of roots, and so Var( F) is finite. If k is alge- 
braically closed, then it is an infinite field, and so most subsets of k 1 = k are not 
algebraic sets. 

In spite of our wanting to draw pictures in the plane, there is a major defect 
with k = K: some polynomials have no zeros. For example, f(x) = x 2 + I 
has no real roots, and so Var(x 2 + 1) = 0. More generally, g(x i, . . . , x n ) = 
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x\ + • • • + x} t + 1 has no zeros in M" , and so Var(g(X)) = 0. Since we are 
dealing with (not necessarily linear) polynomials, it is a natural assumption to 
want all their zeros available. For polynomials in one variable, this amounts to 
saying that k is algebraically closed and, in light of Proposition 7.33, we know 
that Var(/(X)) 0 for every nonconstant f(X ) e k[X ] in this case. Of course, 
algebraic sets are of interest for all helds k, but it makes more sense to consider 
the simplest case before trying to understand more complicated problems. On 
the other hand, many of the first results below are valid for any held k. We will 
state hypotheses needed for each proposition, but the reader should realize that 
the most important case is when k is algebraically closed. 

Here are some elementary properties of Var. 

Proposition 7.36. Let k be afield. 

(i) Var(.ri, x\ — 1) = 0 and Var(O) = k n , where 0 is the zero polynomial. 

(ii) If 1 and J are ideals in k\ X |, then 

Var (77) = Var(7 0 7)= Var(7) U Var(7), 
where IJ = {£ f fi(X) gi (X): ffX) e I and gi (X) ei}. 

(iii) If {If : I £ L) is a family of ideals in k\ X |, then 

Var(]T k) = nv a r(«, 

l l 

where If is the set of all finite sums of the form r^ + • • • + r^ q with 

r L e h r 

Proof. 

(i) If a = {a\ , . . . , a n ) e Var(xi , x\ — 1), then a\ = 0 and a\ = 1 ; plainly, there 
are no such points a, and so VarOci , x\ — 1) = 0. That Var(O) = k n is clear, for 
every point a e k n is a zero of the zero polynomial. 

(ii) Since 77 c 7 O 7, it follows that Var(77) O Var(7 O 7); since 77 C 7, it 
follows that Var (7 7) 0 Var(7). Hence, 

Var (7 7) 2 Var(7 0 7)2 Var(7) U Var(7). 

To complete the proof, it suffices to show that Var(77) c Var(7) U Var(7). If 
a ^ Var(7) U Var(7), then there exist f(X) e 7 and g {X) e 7 with f(a) 0 
and g {a) 0. But f(X) g (X ) e 77 and (f g )(a) = f(a) g (a) 0, because 
k[X ] is a domain. Therefore, a ^ Var(77), as desired. 

(iii) For each t, the inclusion If c ,, / ( gives Var(^^ f ) c Var(7^), and so 

Var(^] If) C pVar(7 £ ). 

l l 
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For the reverse inclusion, if g(X) e ffe hi then there are finitely many i with 
g(X ) = Ylf hifi, where hf e k[ X | and ft ( X ) e If . Therefore, if a e 
fj f Varf/f), then fi(a) = 0 for all I, and so g(a) = 0; that is, a e Var(^^ if). 


Definition. A topology on a set A is a family IF of subsets of X, called closed 
sets 4 , which satisfy the following axioms: 

(i ) 0 e IF and X e IF', 

(ii) if F \ , Fi £ IF. then F\ U F 2 £ J-\ that is, the union of two closed sets is 
closed; 

(iii) if { Ff : l e L\ <£ IF. then fj ( , Ft e IF; that is, any intersection of possibly 
infinitely many closed sets is also closed. 

A topological space is an ordered pair {X, IF), where A is a set and IF is a 
topology on A. 

Proposition 7.36 shows that the family of all algebraic sets is a topology on 
k" ; it is called the Zariski topology, and it is very useful in the deeper study of 
k\X\. The usual topology on ffi has many closed sets; for example, every closed 
interval is a closed set. In contrast, in the Zariski topology on M, every closed set 
(aside from M) is finite. 

Given an ideal I in k[ A], we have just defined its algebraic set Var(7) c k" . 
We now reverse direction: given a subset A c k" , we assign an ideal in k[ A] to 
it; in particular, we assign an ideal to every algebraic set. 

Definition. If A c k" , where k is a field, define its coordinate ring k\ A | to be 
the commutative ring of all restrictions / 1 A of polynomial functions f:k n — »■ k, 
under pointwise operations. 

The polynomial f(x \, . . . , x n ) = x t e k[ A], when regarded as a polynomial 
function, is defined by 

xt a n ) i-> af, 

that is, x, picks out the / tli coordinate of a point in k" . The reason for the name 
coordinate ring is that if a e V, then (xi (a), ... ,x„ (a)) describes a. 

The function res: k[ A] — >• k[A], given by /(A) f\A, is a ring homo- 
morphism, and the kernel of this restriction map is an ideal in k[X\. 

4 One can also defi ne a topology by specifying its open subsets, which are defi ned to be 
complements of closed sets. 
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Definition. If A C k n , where k is a field, define 

Id(A) = {/(X) e k[X ] = k[x\, . . . , x n ] : f(a ) = 0 for every a e A}. 

The Hilbert basis theorem tells us that Id(A) is always a finitely generated 
ideal. 

Proposition 7.37. If A C k n , where k is afield, then there is an isomorphism 

k[Xyid(A)=k[A]. 

Proof. The restriction map res : k[X ] — »■ k\ A | is a surjection with kernel Id(A), 
and so the result follows from the first isomorphism theorem. Note that two 
polynomials agreeing on A lie in the same coset of Id(A). • 

Although the definition of Var(F) makes sense for any subset F of k[ X |. 
it is most interesting when F is an ideal. Similarly, although the definition of 
Id(A) makes sense for any subset A of k" , it is most interesting when A is an 
algebraic set. After all, algebraic sets are comprised of solutions of (polynomial) 
equations, which is what we care about. 

Proposition 7.38. Let k be afield. 

(i) Id(0) = k\X] and, ifk is algebraically closed, Id(k”) = {0}. 

(ii) If A C B are subsets ofk”, then Id(5) C Id(A). 

(iii) If[A(,: I e L) is a family of subsets of k n , then 

Id(U A<0 = Pi Id(Af)- 

l l 


Proof. 

(i) If f(X) e Id(A) for some subset A C k" , then f{a) = 0 for all a € A', 
hence, if f(X) £ Id(A), then there exists a e A with f (a) f 0. In particular, if 
A = 0, every f(X) e k[ X \ must lie in ld(0), for there are no elements a e 0. 
Therefore, ld(0) = k\X\. 

If f{X) e Id (k n ), then f(a ) = 0 for all a e k" ; it follows from Proposi- 
tion 7.32 that f(X) is the zero polynomial. 

(ii) If f{X) e Id(5), then f(b) = 0 for all b e B\ in particular, f (a) = 0 for all 
a e A, because Acfi, and so f{X) e Id(A). 

(iii) Since Ai C (J ( Ai, we have Id(Ar) 3 Id((J f ,Ar) for all i; therefore, 

fj f Id(A^) 3 Id(U f , At). For the reverse inclusion, let f(X) e fj^Id(A^); 
that is, f(ai) = 0 for all I and all ag e At . If b e At, then b e At for some 
l, and hence f(b) = 0; therefore, f{X) e Id(Ur • 
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One would like to have a formula for Id(A n B). Certainly, Id(A n B) = 
Id(A) U Id(fi) is not correct, for the union of two ideals is almost never an ideal 
(see Exercise 7.36 on page 555). 

The next idea arises in characterizing those ideals of the form Id( V) when V 
is an algebraic set. 

Definition. If I is an ideal in a commutative ring R, then its radical, denoted 
by s/7, is 

s/1 = [r e R: r m e I for some integer m >1}. 

An ideal I is called a radical ideal 5 if 

s/1 = I. 

Exercise 7.34 on page 555 asks you to prove that s/1 is an ideal. It is easy to 
see that I C s/1, and so an ideal / is a radical ideal if and only if s/7 C /. For 
example, every prime ideal P is a radical ideal, for if /" € P, then / e P. Here 
is an example of an ideal that is not radical. Let b e k and let I = ((x — b)~). 
Now I is not a radical ideal, for (x — b) 2 e I while x — b g I. 

Definition. An element a in a commutative ring R is called nilpotent if there 
is some n > 1 with a" =0. 

Note that / is a radical ideal in a commutative ring R if and only if R/I has 
no nilpotent elements (of course, we mean that R/I has no nonzero nilpotent 
elements). 

Here is why radical ideals are introduced. 

Proposition 7.39. If an ideal I = Id(A) for some A C k" , where k is a 
field, then it is a radical ideal. Hence, the coordinate ring k\ A ] has no nilpotent 
elements. 

Proof. Since / C s/1 is always true, it suffices to check the reverse inclusion. 
By hypothesis, I = Id(A) for some A c 1"; hence, if / e s/1, then f m e Id(A); 
that is, f(a) m = 0 for all a e A. But the values of f(a) m lie in the field k, and 
so f(a) m = 0 implies f(a) = 0; that is, / e Id(A) = I. • 

Proposition 7.40. 

(i) If I and J are ideals, then s/TTvl = s/1 fl s/1 . 

(ii) If I and J are radical ideals, then I Cl J is a radical ideal. 

5 This term is appropriate, for if r m e I, then its m th root r also lies in I. 
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Proof. 

(i) If / £ •/ 1 n 7, then f m £ 7 n 7 for some m > 1 . Hence, f m £ 7 and 
f m £ 7, and so / £ -Jl and / £ sfl ; that is, / £ s/I n sfl. 

For the reverse inclusion, assume that / £ s/I Cl \fJ , so that f m £ 7 and 
f q £ 7. We may assume that m > q, and so f m £ I fl 7; that is, / £ V/ n 7. 

(ii) If / and 7 are radical ideals, then I = s/I and 7 = ^7 and 

i n 7 c V/ n 7 = v / /nv / i = /n7. • 

We are now going to prove Hilbert’s Nullstellensatz, which says that s/I = 
Id(Var(/)) for every ideal 7 C C[ X |: that is, a polynomial f(X ) vanishes on 
Var(7) if and only if f m e / for some m > 1. This theorem is true for ideals in 
k[X], where k is any algebraically closed field, and the astute reader can adapt 
the proof we give for k = C to any uncountable algebraically closed field k. 
However, a new idea is needed to prove the theorem in general, so that it will 
apply to the algebraic closures of the prime fields, for example, which are count- 
able. 


Lemma 7.41. Let k be afield and let tp : k\ X \ k be a surjective ring homo- 
morphism which fixes k pointwise. If J = ker q), then Var(7) 0. 

Proof For each i, we have x t e k[X\: let (p(xk) = a l e k and let a = 
(fli, . . . , a n ) e k". If f(X) = J2eu...,e n c ei,...,e n x e / • ■ ■ x e ” e k[X], then 

(p{f{X))= ^2 c ei e n n(xi Y l ■ ■ ■ <p{x n y n 

ei,...,e n 

E ^l e n 

Ce u ...,e n a x ••• a n 

ei,...,e n 

= f(au 


Hence, <p{f(X)) = f{a) = ip(f(a )), because f(a) e k and tp fixes k pointwise. 
It follows that f(X) — f(a ) e 7 for every f(X). Now if f(X) e 7, then 
f{a) e 7. But f{a) e k, and, since 7 is a proper ideal, it contains no nonzero 
constants. Therefore, f(a) = 0 and a e Var(7). • 


Theorem 7.42 (Weak Nullstellensatz). If fi(X ), . . . , f,{X) £ C[X], then 
the ideal I = (f \ , . . . , f t ) is a proper ideal in C[X] if and only if 

Var(/i, ...,/?) 7 ^ 0 - 


Proof One direction is clear: if Var(7) 0 , then 7 is a proper ideal, because 
Var(C[X]) = 0 . 
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For the converse, suppose that 7 is a proper ideal in C [ X ] . By Corollary 
7.27, there is a maximal ideal M containing 7, and so K = C [X]/M is a field. It 
is plain that the natural map C[X] -» C [X]/M = K carries C to itself, so that 
K / C is an extension field; hence, K is a vector space over C. Now C[X] has 
countable dimension, for a basis consists of all the monic monomials; it follows 
that dim HA ) is countable (possibly finite). 

Suppose that K is a proper extension of C; that is, there is some t £ K with 
t f C. Since C is algebraically closed, t cannot be algebraic over C, and so it is 
transcendental. Consider the subset B of K, 

B = {1 /(t-c): c £ C} 

(note that t — c 0 because t £ C). The set B is uncountable, for it is indexed 
by the uncountable set C. We claim that B is linearly independent over C; if 
so, then we will have contradicted the fact that dim-r ( K ) is countable. If B is 
linearly dependent, there are nonzero a\, . . .,a r eC with ^- =1 a t /(t —a) = 0. 
Clearing denominators, we have o,(f — ci) • • ■ (t — c,) • • • (t — c r ) = 0. Use 
this formula to define a polynomial h(x) £ C[x]: 

h(x) = a , (x — ci) • • • (x — ci) • • • (x — c r ). 
i 

Now h(t) = 0, so that t transcendental implies h(x) is the zero polynomial. On 
the other hand, /i(ci) = a \ (ci — co) ■ ■ ■ (ci — c r ) i=- 0, a contradiction. We 
conclude that K/C is not a proper extension; that is, K = C. The natural map 
C[V] — »■ K = C [X]/M = C now satisfies the hypothesis of Lemma 7.41, and 
so Var(M) 7^ 0. But Var(AL) C Var(7), and this completes the proof. • 

Consider the special case of this theorem for / = (/(x)) C C[x], where 
f{x) is not a constant. To say that Var (/) c C is nonempty is to say that /(x) 
has a complex root. Thus, the weak Nullstellensatz is a generalization to several 
variables of the fundamental theorem of algebra. 

A translation of the German term Nullstellensatz is “Locus-of-zeros theo- 
rem;” this name comes from the following corollary. 

Corollary 7.43. If I is a proper ideal in C[X], then there is an element a = 
(oi, . . . , a n ) £ C" with f(a) = 0 for all f £ I. 

Proof Choose a to be any element in Var(7). • 

The following proof of Hilbert’s Nullstellensatz uses the “Rabinowitch trick” 
of imbedding a polynomial ring in n variables into a polynomial ring in n + 1 
variables. 
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Theorem 7.44 (Nullstellensatz). If I is an ideal in C[X], then 

Id(Var(/)) = \/7. 

Thus, f vanishes on Var(7) if and only if f m £ I for some m > 1. 

Proof. The inclusion Id(Var(7)) > s/1 is obviously true, for if /"'(a) = 0 for 
some m > 1 and all a £ Var(7), then f (a) = 0 for all a, because f (a) £ C. 

For the converse, assume that h £ Id(Var(7)), where 7 = (/ i, ...,//); that 
is, if ft (a) = 0 for all i, where a £ C" , then h ( a ) = 0. We must show that some 
power of h lies in 7. Of course, we may assume that h is not the zero polynomial. 
Let us regard 

C[xi, . . . , x„] c C[xi, . . . , x n , yl; 

thus, every fj(x\, , x n ) is regarded as a polynomial in n + 1 variables that 
does not depend on the last variable y. We claim that the polynomials 

/i, 1 — yh 

in C[xi , . . . , x n , y] have no common zeros. If (a \, . . . , a n , b) £ C' ,+1 is a com- 
mon zero, then a = (a i , . . . , a n ) £ C" is a common zero of f] , ... , f t , and so 
h(a) = 0. But now I — bh (a ) = 1 f 0. The weak Nullstellensatz now applies to 
show that the ideal (/i, 1 — yh) in C[xi , . . . , x„, y] is not a proper ideal. 
Therefore, there are gi, . . . , g r+ i £ C[xi , ... ,x n , y] with 

1 = flgl H f ftgt + (1 - yh)g t+ 1 . 

Now make the substitution y = I ///, so that the last term involving y ;+ 1 van- 
ishes. Writing the polynomials g,(X, y) more explicitly: 

dt 

gi(X , y) = ^ ~2uj(X)y j 
7=0 

[so that gi(X, h~ l ) = l/ d / = o u j(X)h~’], we see that 

h di gi (X,h~ l )e C[X]. 

Therefore, if m = max{d| . . . . , cl , }. then 

h m = (h m gl )fi + • • • + (h m g t )f t £ 7. . 

Theorem 7.45. Every maximal ideal M in C[xi , . . . , x„J has the form 
M = (xi - ai , . . . , x n — a n ), 

where a = {a\, . . . , a n ) £ C”, and so there is a bijection between C" and the 
maximal ideals in C[xi , ,x n ]. 
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Proof. Since M is a proper ideal, we have Var( M ) 0, by Theorem 7.42; 

that is, there is a = (a\, . . . , a n ) e k n with f(a) = 0 for all f e M. Since 
Var (M) = [b e k n : f(b) = 0 for all / e M], we have { a } C Var(A7). 
Therefore, Proposition 7.35 gives 

Id(Var(M)) C Id({a}). 

Now Theorem 7.44 gives Id( Vari/Vf J) = -Jm. But •/¥ = P for every prime 
ideal P\ as maximal ideals are prime, we have Id(Var( M)) = M. Note that 
Id({n}) is a proper ideal, for it does not contain any nonzero constant; and so 
maximality of M gives M = Id({a}). Let us compute Id({n}) = { f(X ) e 
C[X] : f{a) = 0}. If, for each i, ffx i, . . . , x n ) = Xj — at, then fi(a) = 0, 
and so Xi — cii e Id({o}). Hence, (xi — a\,...,x„ — a n ) C Id({a}). But 
(x'i — a\, . . . , x n — a n ) is a maximal ideal, by Corollary 7.10, so that 

(xi — oi, . . . , x n — a n ) = Id({a}) = M. • 

Hilbert proved the Nullstellensatz in 1893. The original proofs of the Null- 
stellensatz for arbitrary algebraically closed fields, in the 1920’s, used “elimina- 
tion theory” (see van der Waerden, Modern Algebra, Section 79) or the Noether 
normalization theorem (see Zariski and Samuel, Commutative Algebra II, pp. 
164 - 167). Less computational proofs, involving Jacobson rings, were found 
around 1960, independently, by W. Krull and O. Goldman. 

We continue the study of the operators Var and Id. 

Proposition 7.46. Let k be afield. 

(i) For every subset A C k n , 


Var(Id(A)) 2 A. 

(ii) For every ideal I c k[X\ 

Id(Var(/)) 2 F 

(iii) If V is an algebraic set ofk", then 

Var(Id(V)) = V. 


Proof. 

(i) This result is almost a tautology. If a e A. then f(a) = 0 for all f(X) e 
Id(A). But every f(X) e Id(/\) annihilates A, by definition of Id(A), and so 
a e Var(Id(A)). Therefore, Var(Id(A)) ^ A. 

(ii) Again, one merely looks at the definitions. If f(X) e I, then f (a) = 0 
for all a e Var(7); hence, f(X) is surely one of the polynomials annihilating 
Var(7). 
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(iii) If V is an algebraic set, then V = Var (J) for some ideal J in k[ X |. Now 

Yar(Id(Var(7))) 2 Var(7), 

by part (i). Also, part (ii) gives Id(Var(7)) 3 J, and applying Proposition 7.35(i) 
gives the reverse inclusion 


Var(Id(Var(7))) C Var(7). 

Therefore, Var(Id(Var( J))) = Var(/); that is, Var(Id(V)) = V. • 


Corollary 7.47. 

(i) IfV i and V 2 are algebraic sets and Id(Vi ) = Id( V 2 ), then V\ = VS. 

(ii) If l\ and h are radical ideals and Var(7| ) = Var(/ 2 ), then I \ = 12- 
Proof. 

(i) If Id(Vi) = Id ( VS), then Var( [d(V| )) = Var(Id(VS)); by Proposition 7.46(iii), 
we have Vi = V 2 . 

(ii) IfVar(/i) = Var(/ 2 ), then Id ( Var (/ 1 )) = Id(Var(/ 2 )). By the Nullstellensatz, 
fT\ = sff. Since 7i and I 2 are radical ideals, by hypothesis, we have I\ = I 2 . 


The next theorem summarizes this discussion. 


Theorem 7.48. The functions V 1 -^ \d(V ) and I i-> Var(7) are inverse order- 
reversing bijections 

{algebraic sets cC'j {radical ideals c C[vi , . . . , x n \ } . 

Proof By Proposition 7.46, we have Var(Id( V)) = V for every algebraic set 
V, while Theorem 7.44 gives Id(Var(7)) = \/7 for every ideal 7. • 

Can an algebraic set be decomposed into smaller algebraic subsets? 


Definition. An algebraic set V is irreducible if it is not a union of two proper 
algebraic subsets; that is, V W' U W", where both W' and W" are algebraic 
sets that are proper subsets of V. A variety 6 is an irreducible algebraic set. 

6 The term variety arose as a translation by E. Beltami (inspired by Gauss) of the German 
term Mannigfaltigkeit used by Riemann; nowadays, this term is usually translated as manifold. 
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Proposition 7.49. Every algebraic set V in k n is a union of finitely many vari- 
eties: 

V = U V 2 U • • • U V m . 

Proof Call an algebraic set W £ k" good if it is irreducible or a union of 
finitely many varieties; otherwise, call W bad. We must show that there are no 
bad algebraic sets. If W is bad, it is not irreducible, and so W = W'U IV", where 
both W' and W" are proper algebraic subsets. But a union of good algebraic sets 
is good, and so at least one of W' and W" is bad; say, W' is bad, and rename it 
W' = W\ . Repeat this construction for W\ to get a bad algebraic subset W 2 . It 
follows by induction that there exists a strictly descending sequence 


w d Wi 2 • • • 2 w n d • • • 


of bad algebraic subsets. Since the operator Id reverses inclusions, there is a 
strictly increasing chain of ideals 


Id(W) C Id(Wi) C • • • C Id(W„) C • • • 

[the inclusions are strict because of Corollary 7.47(i)], and this contradicts the 
Hilbert basis theorem. We conclude that every variety is good. • 

Varieties have a nice characterization. 

Proposition 7.50. An algebraic set V in k n is a variety if and only ifld(V) is a 
prime ideal in k\ X \. Hence, the coordinate ring k[ V] of a variety V is a domain. 

Proof Assume that V is a variety. It suffices to show that if fi(X), f 2 (X ) £ 
Id(V), then f\(X)f 2 (X) £ Id(V). Define, for * = 1,2, 

Wi = VnVarf /,(*)). 

Note that each Wj is an algebraic subset of V, for it is the intersection of two 
algebraic subsets; moreover, since fi(X) g Id(V), there is some a\ e V with 
fi (ui) f 0, and so Wj is a proper algebraic subset of V. Since V is irreducible, 
we cannot have V = W\ U Wi. Thus, there is some b e V which is not in 
W\ U W 2 ', that is, fi(b) 0 / fi{b). Therefore, fi(b)f 2 (b) 7 ^ 0, hence 
fi(X)f 2 (X) g Id(V), and so Id(V) is a prime ideal. 

Conversely, assume that Id(V) is a prime ideal. Suppose that V = Vf U V 2 , 
where V\ and V 2 are algebraic subsets. If V 2 C V, then we must show that 
V = V\. Now 


ld(V) = id(Vi) n ld(V 2 ) 2 ld(Vi)ld(V 2 ); 

the equality is given by Proposition 7.38, and the inequality is given by Exer- 
cise 7.11 on page 522. Since Id(V') is a prime ideal, Exercise 7.11 (ii) says that 
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Id(V| ) C Id(V) or Id( VS) C Id(V). Now VS C V implies Id(V 2 ) 2 Id(V), and 
we conclude that Id(Vi) C Id(V). But the reverse inequality Id(Vi) 3 Id( V) 
holds as well, because V\ C V, and so Id(VY) = Id(V). Therefore, V\ = V, by 
Corollary 7.47, and so V is irreducible; that is, V is a variety. • 

We now consider whether the varieties in the decomposition of an algebraic 
set into a union of varieties are uniquely determined. There is one obvious way 
to arrange nonuniqueness. Suppose that there are two prime ideals P < Q 
in k[X | (for example, (x) C (x, y) are such prime ideals in k[x, y |). Now 
Var( Q ) C Var( P), so that if Var( P) is a subvariety of a variety V, say, V = 
Var (P) U V 2 U • • • U V m , then Var (Q) can be one of the V; or it can be left out. 

Definition. A decomposition V = V\ U • • • U V m is an irredundant union if no 
Vi can be omitted; that is, for all i, 

y/^u-u^u-u v m . 

Proposition 7.51. Every algebraic set V is an irredundant union of varieties 

V = V! U • • • U V m ; 

moreover, the varieties Vi are uniquely determined by V. 

Proof. By Proposition 7.49, V is a union of finitely many varieties; say, V = 
V\ U • • • U V m . If m is chosen minimal, then this union must be irredundant. 

We now prove uniqueness. Suppose that V = WiU- • -UW 4 . is an irredundant 
union of varieties. Let X = {Vi, ... ,V m } and let Y = {Wi, . . . , W iS }; we shall 
show that X = V. If V,- € X, we have 

Vi = Vi n v = |J (Vi n Wj). 

j 

Now Vi = Vj fl Wj 0 for some j; since V, is irreducible, there is only one 
such Wj. Therefore, V, = L, Cl Wj, and so Vj C Wj. The same argument applied 
to Wj shows that there is exactly one Vi with Wj C V t . Hence, 

Vi C Wj C V t . 

Since the union V\ U ■ ■ ■ U V m is irredundant, we must have Vi = Ve, and so 
Vi = Wj = Ve; that is, Vj e Y and X c Y. The reverse inclusion is proved in 
the same way. • 

Definition. An intersection I = J\ Cl • • • Cl I m is an irredundant intersection 
if no J, can be omitted; that is, for all i, 

/^y 1 n---n^n---n j m . 
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Corollary 7.52. Every radical ideal J in k\X\ is an irredundant intersection 
of prime ideals, 

j = p x n • • • n p m ; 

moreover, the prime ideals P, are uniquely determined by J. 

Proof. Since 7 is a radical ideal, there is a variety V with J = Id(V). Now V 
is an irredundant union of irreducible algebraic subsubsets, 

V = Vi U • • • U V m , 


so that 

j = id(V) = id(Vi) n • • • n id( W). 

By Proposition 7.50, V ) irreducible implies Id( V, ) is prime, and so J is an inter- 
section of prime ideals. This is an irredundant intersection, for if there is i with 
J = Id(V) = f)j& Id( V/), then 

V = Var(Id(V)) = |J Var(Id (Vj)) = |J Vj, 

m ii* 

contradicting the given irredundancy of the union. 

Uniqueness is proved similarly. If J = Id ( VU i ) fl ■ ■ ■ n Id ( W s ), where each 
Id ( VU, ) is a prime ideal (hence is a radical ideal), then each W, is a variety. 
Applying Var expresses V = Var(Id( V )) = Var(7) as an irredundant union of 
varieties, and the uniqueness of this decomposition gives the uniqueness of the 
prime ideals in the intersection. • 

Here are some natural problems arising as one investigates these ideas fur- 
ther. First, what is the dimension of a variety? There are several candidates, 
and it turns out that prime ideals are the key. If V is a variety, then its dimen- 
sion is the length of a longest chain of prime ideals in its coordinate ring k[V] 
(which, by the correspondence theorem, is the length of a longest chain of prime 
ideals above Id(V) in k\X\). Another problem involves intersections. If Var(/) 
is a curve arising from a polynomial of degree d, how many points lie in the 
intersection of V with a straight line? Bezout’s theorem says there should be d 
points, but one must be careful. First, one must demand that the coefficient field 
be algebraically closed, lest Var(/) = 0 cause a problem. But there may also be 
multiple roots, and so some intersections may have to be counted with a certain 
multiplicity in order to have B 'ezout’s theorem hold. Defining multiplicities for 
intersections of higher dimensional varieties is very subtle. 

It turns out to be more convenient to work in a larger projective space. Re- 
call that we gave two constructions of a projective plane over a field k. In the 
subsection of Chapter 3 called Horizons, we constructed a projective plane by ad- 
joining a line at infinity to the plane k 2 , and this construction can be generalized 
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to higher dimensions by adjoining a “hyperplane at infinity” to k " . To distinguish 
the subset k n from projective space, one calls it affine space, for k n consists of 
the “finite points” - that is, those points not at infinity. We gave an alternate con- 
struction of a projective plane, in Example 4.26, in which points are essentially 
lines through the origin in k 3 . Each point in this construction has homogeneous 
coordinates [ao, a \ , ai\. where a,- e k and \a' {) , a\ . a' 2 \ = [ao. a\ . 02 ] if there is 
a nonzero t e k with a\ = ta , for all i. This construction is more amenable to 
algebraic sets. For fixed n > 1, define an equivalence relation on k n+l — {0} by 

(a 0 , . . . , a n ) = (ao, • • • j a n ) 

if there is a nonzero t e k with a. = ta\ for all i. Denote the equivalence class 
of ( ao , . . . , a„) by [ao, ■ ■ • • a„], call it a projective point, and define projective 
n-space over k, denoted by P„ ( k ), to be the set of all projective points. It is now 
natural to define projective algebraic sets as zeros of a family of polynomials. 
For example, if f(X ) e k[X ] = k[xo, x\, . x„], define 

Var (/) = {[a 0 , . . . , a„] € P „(k) : f([a 0 ,a n ]) = 0}. 

There is a problem with this definition: f(X) is defined on points in k n+l , 
not on projective points; that is, we need f (ao , . . . , a„) = 0 if and only if 
f(tao, . . . , ta n ) = 0, where t e k is nonzero. A polynomial f(x 0 , ■ ■ ■ , x n ) 
is called homogeneous of degree m > 0 if 

f(tx 0 , . . . , tx n ) = t m f(x 0 , ..., x n ) 

for all t e k. For example, a monomial cx q° • • • x e f is homogeneous of degree 
m, where m = eo + • • • +e n is its total degree-, a polynomial f(X) e k[ X | is ho- 
mogeneous of degree m if f(X) = Y2 c eo.-,e„ x o 0 ■ ■ ■ x'j’ , where the monomials 
all have total degree m. If f(X) is homogeneous and /(ao, ■ ■ ■ ,a n ) = 0, then 
f(tao, . . . , ta n ) = /"/(ao, . . . , a„) = 0. Thus, when f(X) is homogeneous, 
it makes sense to say that a projective point is a zero of /. Define a projective 
algebraic set as follows. 

Definition. If F c k[X] = k[x 0 , . . . , x n ] is a set of homogeneous polynomials, 
then the projective algebraic set defined by F is 

Var(F) = {[a] e P „(k) : /([a]) = 0 for every f(X) e F}, 
where [a] abbreviates [ao, . . . , a,,]. 

The reason for introducing projective space is that it is often the case that 
many separate affine cases become part of one simpler projective formula. In- 
deed, B 'ezout’s theorem is an example of this phenomenon. 
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Exercises 


7.33 

*7.34 

7.35 


*7.36 

7.37 


7.38 


7.39 

7.40 


7.41 


7.42 


Prove that an element a in a commutative ring ft is nilpotent if and only 1 + a is a 
unit. 

If 7 is an ideal in a commutative ring R, prove that its radical. JJ , is an ideal. 

If R is a commutative ring, then its nilradical nil(ft) is deft ned to be the intersec- 
tion of all the prime ideals in R. Prove that nil(/?) is the set of all the nilpotent 
elements in R : 


nil (ft) = {r G ft : r m = 0 for some m > 1}. 

If 7 and j are ideals in C[X], prove that Id(Var(7) fl Var(7)) = s/ 1 + J. 

If k is a fi eld, a hypersurface is a subset of It 1 of the form Var(/), where / G 
k[xi, . . . ,x n \. Prove that every algebraic set Var(7) is an intersection of finitely 
many hypersurfaces. 

(i) Show that x 2 + y 2 is irreducible in M[.r, y], and conclude that (x 2 + y 2 ) 
is a prime, hence radical, ideal in R[jc, y], 

(ii) Prove that Var(x 2 + y 2 ) = {(0, 0)}. 

(iii) Prove that Id(Var(x 2 + y 2 )) > (x 2 + y 2 ), and conclude that the radical 
ideal (x 2 + y 2 ) in R[.v, y] is not of the form Id(V) for some algebraic 
set V. Conclude that the Nullstellensatz may fail in k[X] if k is not 
algebraically closed. 

(iv) Prove that (x 2 + y 2 ) = (x + iy) fl (x — iy) in C[x, y], 

(v) Prove that Id(Var(A 2 + v 2 )) = (x 2 + y 2 ) in C[jc, y]. 

Prove that every radical ideal in C[X] is an irredundant intersection of prime ideals. 
Prove that if f \ , . . . , f t e C[X], then Var(/i ,...,/,) = 0 if and only if there are 
h i , . . . , h t G k[X] such that 


t 

1 =Y,hi{X)f{X). 

7=1 

Consider the statements: 

I. If 7 is a proper ideal in C[X], then Var(7) ^ 0. 

II. Id(Var(7)) = J1 . 

III. Every maximal ideal in C[X] has the form (jci — a\, . . . , x„ — a n ). 

Prove III => I. (We have proved I => II and II III in the text.) 

Let ft be a commutative ring, and let 


Spec (ft) 

denote the set of all the prime ideals in ft. If E C Spec(ft), defi ne the closure of a 
subset E = [P a : a G A} of Spec(ft) to be 

7t = {all the prime ideals ft G ft with P„ C ft for all ft„ g ft}. 

Prove the following: 

(i) {0} = Spec(ft). 
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(ii) R = 0. 

(iii) Ef E t = Hr ~Et- 

(iv) £nF = IUF. 

Conclude that the family of all subsets of Spec(R) of the form E is a topology on 
Spec(R); it is called the Zariski topology. 

7.43 Prove that an ideal P in Spec(R) is closed in the Zariski topology if and only if P 
is a maximal ideal. 

7.44 If X and Y are topological spaces, then a function g : X —*■ Y is continuous if, for 
each closed subset Q of 7, the inverse image g _1 (G) is a closed subset of X. 

Let / : R — > A be a ring homomorphism, and defi ne f* : Spec(A) -> Spec(R) 
by f*(Q) = / _1 (<2), where Q is any prime ideal in A. Prove that f* is a contin- 
uous function. [Recall Exercise 7.3 on page 521: / -1 (G) is a prime ideal.] 

7.45 Prove that the function (p: C" — »• Spec(C[xi , . . . , x„])], defi ned by 

(p: {ai , . . . , a„) i-A- (xi - ai, ... ,x n - a n ), 

is a continous injection (where both C" and Spec(C|xi , . . . , x„]) are equipped 
with the Zariski topology; the Zariski topology on C" was defi ned on page 543. 

7.46 Prove that any descending chain 


Ci 2 F 2 2 • • • 2 F m 2 C m+ i 2 • • • 


of closed sets in C" stops; there is some t with F t = F t+ \ = • • • . 


7.5 Grobner Bases 

Given two polynomials /(x), g (x ) e k[x] with g{x) 0, where k is a field, 
when is g(x) a divisor of /(x)? The division algorithm gives unique polynomials 
q(x), r(x) £ k[x ] with 


fix) = q(x)g(x) + r(x), 

where r = 0 or deg(r) < deg(g), and g \ f if and only if the remainder r = 0. 
Let us look at this formula from a different point of view. To say that g \ f is to 
say that / e (g), the principal ideal generated by g(x). Thus, the remainder r is 
the obstruction to / lying in this ideal; that is, / e (g) if and only if r = 0. 
Consider a more general problem. Given polynomials 

fix), g l(x), . . . , g m (x) € k[x], 

where k is a field, when is dix) = gcd{gi(x), . . . , g m ix)} a divisor of /? The 
euclidean algorithm finds d, and the division algorithm determines whether d f . 
From another viewpoint, the two classical algorithms combine to give an algo- 
rithm determining whether f € (g\, , g m ) = id). 
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We now ask whether there is an algorithm in k[x i, . . . , x n \ = k\X\ to de- 
termine, given f(X), gi(X), g m (X) e k[X], whether / € (gi, . . . , g m ). A 
division algorithm in k[ X | should be an algorithm yielding 

r(X), ai (X),...,a m (X)ek[Xl 
with r{X) unique, such that 

/ = a\g\ H \-a m g m + r. 

Since (gi, . . . , g m ) consists of all the linear combinations of the g’s, such a gen- 
eralized division algorithm would say again that the remainder r is the obstruc- 
tion: / e (gi, . . . , g m ) if and only if r = 0. 

We are going to show that both the division algorithm and the euclidean al- 
gorithm can be extended to polynomials in several variables. Even though these 
results are elementary, they were discovered only recently, in 1965, by B. Buch- 
berger. Algebra has always dealt with algorithms, but the power and beauty of 
the axiomatic method has dominated the subject ever since Cayley and Dedekind 
in the second half of the nineteenth century. After the invention of the transis- 
tor in 1948, high-speed calculation became a reality, and old complicated algo- 
rithms, as well as new ones, could be implemented; a higher order of computing 
had entered algebra. Most likely, the development of computer science is a ma- 
jor reason why generalizations of the classical algorithms, from polynomials in 
one variable to polynomials in several variables, are only now being discovered. 
This is a dramatic illustration of the impact of external ideas on mathematics. 


Monomial Orders 

The most important feature of the division algorithm in k[x\ is that the remainder 
r(x) has small degree. Without the inequality deg(r) < deg(g), the result would 
be virtually useless; after all, given any Q(x) e k[x ]. there is an equation 

fix) = Q(x)g(x) + [fix) - Q(x)gix)]. 

Now polynomials in several variables are sums of monomials cx “ 1 • • • x“ n , where 
c e k and a, > 0 for all i . Here are two degrees that one can assign to a 
monomial. 


Definition. The multidegree of a monomial cx“' ■ ■ ■ x“" € k[x \ , . . . , x n \, where 
c e k is nonzero and a,- > 0 for all i, is the «-tuple a = (aq, . . . , a n ); its total 
degree is the sum |cif | =<*! + •■•+ a n . 
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When dividing fix) by g (x ) in k[x ], one usually arranges the monomials in 
f{x) in descending order, according to degree: 

fix) = c„x n + c n -\x n ~ { H h ClX~ + C\X + Co- 

Consider a polynomial in several variables: 

f(X) = fix 1, ...,X„) = a„)x“ l ■ ' ' X a n " . 

We will abbreviate (c*i, . . . , a„) to a and x“‘ • • • x“" to X“, so that /(X) can be 
written more compactly as 

/(X) = £c„X“. 

a 

Our aim is to arrange the monomials involved in /(X) in a reasonable way, and 
we do this by ordering their multidegrees. 

The set N", consisting of all the n-tuples a = (on, . . . , a n ) of natural num- 
bers, is a commutative monoid 1 under addition: 

(at OCn) + (Pi, ■ ■ ■ , Pn) = (at + Pi, ■ ■ ■ , 0C n + Pn)- 

This monoid operation is related to the multiplication of monomials: 


Recall that a partially ordered set is a nonempty set X equipped with a rela- 
tion ■< which is reflexive, antisymmetric, and transitive. Of course, we may write 
x -< y if x -< y and x f y, and we may write y > x (or y > x ) instead of x <y 
(or x -< y). 

Definition. A partially ordered set X is well-ordered if every nonempty subset 
S c X contains a smallest element; that is, there exists .vo e 5 with .vq ^ .v for 
all s e S. 

For example, the Least Integer Axiom says that the natural numbers N with 
the usual inequality < is well-ordered. 

Proposition 7.53. Let X be a well-ordered set. 

(i) If x, y e X, then either x < y or y < x. 

(ii) Every strictly decreasing sequence is finite. 

7 Recall that a monoid is a set with an associative binary operation and an element one. 
Here, the operation is +, the one is (0, ... , 0), and the operation is commutative: a+i 6 — /3+a. 
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Proof. 

(i) The subset S = {x, y} has a smallest element, which must be either x or y. In 
the first case, x < y; in the second case, y < x. 

(ii) Assume that there is an infinite strictly decreasing sequence, say. 


Xl >- X2 >- X3 >- • • • . 


Since X is well-ordered, the subset S consisting of all the x, has a smallest ele- 
ment, say, x n . But x„+i -< x n , a contradiction. • 

The second property of well-ordered sets will be used in showing that an 
algorithm eventually stops. In the proof of the division algorithm for polynomials 
in one variable, for example, we associated a natural number to each step: the 
degree of the remainder. Moreover, if the algorithm does not stop at a given step, 
then the natural number associated to the next step - the degree of its remainder 
- is strictly smaller. Since the natural numbers are well-ordered by the usual 
inequality <, this strictly decreasing sequence of natural numbers must be finite; 
that is, the algorithm must stop after a finite number of steps. 

We are interested in orderings of multidegrees that are compatible with mul- 
tiplication of monomials - that is, with addition in the monoid N' ! . 


Definition. A monomial order is a well-order on N” such that 
a < ft implies a + y < f + y 


for all a, f, y e N" . 

A monomial order will be used as follows. If X = (xi, . . . , x„), then we 
define X a < X P in case ct < that is, monomials are ordered according to 
their multidegrees. 


Definition. If N" is equipped with a monomial order, then every f (X) e 
k[ X | = k[x i , . . . , x„] can be written with its largest term first, followed by its 
other, smaller, terms in descending order: 

f(X ) = c a X a + lower terms. 

Define its leading term to be LT(/) = c a X a and its degree to be DEG(/) = a. 
Call f(X) monic if LT(/) = X“; that is, if c a = 1. 

There are many examples of monomial orders, but we shall give only the two 
most popular ones. 
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Definition. The lexicographic order on N" is defined by a ^| ex f in case 
a = f or the first nonzero coordinate in f — a is positive. 8 

The term lexicographic refers to the standard ordering in a dictionary. If 
a -<iex f, then they agree for the first i — 1 coordinates (where i > 1), that is, 
ot\ = f\. . . . , otj - 1 = fii-i, and there is strict inequality: cr, < f,. For example, 
the following German words are increasing in lexicographic order (the letters are 
ordered a < b < c < ■ ■ ■ < z)'. 


ausgehen 

ausladen 

auslagen 

auslegen 

bedeuten 

Proposition 7.54. The lexicographic order ^] e x on N" is a monomial order. 

Proof. First, we show that the lexicographic order is a partial order. The rela- 
tion ^i ex is reflexive, for its definition shows that a ^| ex a. To prove antisymme- 
try, assume that a ;<i ex f and ft ^| ex a. If a f f, there is a first coordinate, say 
the ith, where they disagree. For notation, we may assume that a,- < /i, . But this 
contradicts f ;<i ex ot. To prove transitivity, suppose that a -<i ex /i and ft -q ex y 

(it suffices to consider strict inequality). Now a \ = fi \ .... , or,- i = fi- \ and 

a/ < fi. Let y p be the first coordinate with f p < y p . If p < i, then 

Yl = f\ = «t, • • • , Yp - 1 = f P - 1 = oi P - 1 , oi p = f p < y p -, 
if p > i, then 

Yl = fl = «t, . . . , Yi-i = ft - 1 = on- 1, on < fi = Yi- 

In either case, the first nonzero coordinate of y — a is positive; that is, 
a “''lex Y- 

Next, we show that the lexicographic order is a well-order. If 5 is a nonempty 
subset of N' 1 , define 

C i = {all first coordinates of n-tuples in 5}, 

and define k\ to be the smallest number in C i (note that C\ is a nonempty subset 
of the well-ordered set N). Define 

Ci = {all second coordinates of n-tuples (<5i, CC 2 , ■ ■ ■ , ot n ) e 5}. 

8 The difference ji — a may not lie in N" , but it does lie in Z”. 
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Since C 2 7^ 0 , it contains a smallest number, 52. Similarly, for all i < n, 
define Cj+ 1 as all the (i + l)th coordinates of those n-tuples in S whose first 
i coordinates are (< 5 i , 82, , 8,). and define 5 ,-+i to be the smallest number in 
Cj+ 1. By construction, the n -tuple 8 = (<$1, 82, ■ ■ . , 8 n ) lies in 5 ; moreover, if 
a = (ai, a2, . . . , a n ) e 5 , then 

a — 8 = (ai — 81,02 — 82, ...,a n — 8 n ) 

has all its coordinates nonnegative. Hence, if a 7 -8, then its first nonzero co- 
ordinate is positive, and so 8 -<i ex a. Therefore, the lexicographic order is a 
well-order. 

Assume that a ^| ex /l; we claim that 

a + Y ^lex P + Y 

for all y e N. If a = p, then a + y = ft + y. If a ^i ex P, then the first nonzero 
coordinate of p — a is positive. But 

(P + y) - (a + y) = P - a, 

and so a + y -<\ ex P + Y- Therefore, ^i ex is a monomial order. • 

In the lexicographic order, x\ > X2 > X3 > • • • , for 

(1,0, ...,0) -< (0, 1,0, ...,0) -< (0,0, 1,0, ...,0) -< ••• . 

Any permutation of the variables x a (i), ... , x a ( n) yields a different lexicographic 
order on N” . 

Remark. If X is any well-ordered set with order then the lexicographic 
order on X n can be defined by a = (a\, . . . , a n ) ^i ex b = {b\, . . ., b n ) in case 
a = b or if they first disagree in the ith coordinate and a, -< /?,. It is a simple 
matter to generalize Proposition 7.54 by replacing N with X. A 

Definition. If X is a set and n > 1 , we define a positive word on X of length n 
to be a function w: { 1,2 X, and we denote w by 


U! = X\X2 ■ ■ -x n , 

where xi = w(i). Of course, w need not be injective; that is, there may be 
repetitions of x’s. Two positive words can be multiplied: if u>' = x[. . . x' m , then 

Ww' = X\X2 ■ ‘ ‘ X n x\ . . . x' m . 

We introduce the empty word , denoted by 1 , as the word of length 0 such that 
\u> = w = wl for all positive words w . With these definitions, the set \V(X), 
consisting of all the positive words on X, is a monoid. 
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Corollary 7.55. If X is a well-ordered set, then W(X) is well-ordered in the 
lexicographic order ( which we also denote by ;<i ex ). 

Proof. We will only give a careful definition of the lexicographic order here; 
the proof that it is a well-order is left to the reader. First, define 1 ^| cx w for all 
w e IV (X). Next, given words u = x \ ■ - x p and v = y i • • • y q in )'V( X), make 
them the same length by adjoining 1 ’s at the end of the shorter word, and rename 
them u' and v' in \V(X ). If m > max{ p, q), we may regard u ' , v . e X m , and 
we define u ^] ex v if if ^i ex v' in X m . (This is the word order commonly used 
in dictionaries, where a blank precedes any letter: for example, muse precedes 
museum). • 

Example 7.56. 

Given a monomial order on N", each polynomial f(X) = c a X a e k[X] 
= k[x i , . . . , x u | can be written with the multidegrees of its terms in descending 
order: a i > 02 > ■ ■ ■ > a p . Write 

multiword(/) = ay • • • a p e W(N"). 

Let cpXP be a nonzero term in f{X), let g(X) e &[X] have DEG(g) -< f, and 
write 

f(X)=h(X) + cpX^ +£(X), 

where h(X) is the sum of all terms in f(X) of multidegree >- f J > and £(X) is the 
sum of all terms in f(X) of multidegree -< /i. We claim that 

multiword(/(X) — cy; X ^ + g(X)) | ex multiwordf/) in W(X). 

The sum of the terms in f(X) — cpX@ + g ( X ) with multidegree >- ft is h(X), 
while the sum of the lower terms is £ (X) + g(X). But DEG(f + g) ■< by 
Exercise 7.49 on page 564. Therefore, the initial terms of f{X) and / (X) — 
eg X l J ‘ + g ( X ) agree, while the next term of f (X)— cp X^' + g ( X ) has multidegree 
-< ft, and this proves the claim. 

Since >V(N" ) is well-ordered, it follows that any sequence of steps of the 
form f(X) — > f(X) — cpX^ + g{X), where cpX& is a nonzero term of f{X) 
and DEG(g) f, must be finite. ◄ 

Here is the second popular monomial order. 

Definition. The degree-lexicographic order on N" is defined by a ^dlex f> in 
case a = f> or 

n n 

n = J2 ai< = 

i = 1 i = 1 

or, if | a | = \P |, then the first nonzero coordinate in ft — a is positive. 
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In other words, given a = (04 , . . . , a n ) and p = (ji\, , p n ), first check 
total degrees: if |a| < \p\, then a ^diex P\ if there is a tie, that is, if a and 
P have the same total degree, then order them lexicographically. For example, 
(1, 2, 3, 0) ^ dlex (0, 2, 5, 0) and (1, 2, 3, 4) ^ dlex (1, 2, 5, 2). 


Proposition 7.57. The degree-lexicographic order ;< d i ex is a monomial order 
on N". 

Proof. It is routine to show that r< d i ex is a partial order on N" . To see that it is 
a well-order, let S be a nonempty subset of id" . The total degrees of elements 
in S form a nonempty subset of N, and so there is a smallest such, say, t. The 
nonempty subset of all a e S having total degree t has a smallest element, 
because the degree-lexicographic order ^ d i ex coincides with the lexicographic 
order ^| ex on this subset. Therefore, there is a smallest element in S in the 
degree-lexicographic order. 

Assume that a z< d i ex P and y e N". Now \a + y\ = |a| + \y\, so that 
|a| = |/S| implies \a + y\ = \p + y\ and |o!| < \p\ implies \a + y\ < \p + y\; 
in the latter case, Proposition 7.54 shows that a + y z< d | ex f> + y. • 

The next proposition shows, with respect to any monomial order, that poly- 
nomials in several variables behave like polynomials in a single variable. 


Proposition 7.58. Let < be a monomial order on N”, and let f(X), g(X), 
h (X) ek[X] =k[xi, ...,x n ]. 

(i) If DEG (/) = DEG(g), then LT (g) \ LT (/). 

(ii) LT(hg) = LT(/z)LT(g). 

(iii) If DEG(/) = DEG (hg), then LT(g) | LT (/). 

Proof. 

(i) If DEG(/) = a = DEG(g), then LT (/) = cZ" and LT(g) = dX a . Hence, 
LT(g) | LT (/) [and also LT(/) | LT (g)]. 

(ii) Let h{X) = bX y + lower terms and let h(X) = c X ^ + lower terms, so 
that LT ( /; ) = cX y and LT(g) = bX@. Clearly, cbX v+ P is a nonzero term of 
h(X)g(X). To see that it is the leading term, let c /( X 1 ' be a term of h(X) with 
p. < y, and let b v X v be a term of g(X) with v < p. Now DEG (c, 1 X l ^b v X v ) = 
p. + v; since < is a monomial order, we have p + v<y + v<y+p. Thus, 
cbX y+ P is the term in h(X)g(X) with largest multidegree. 

(iii) Since DEG(/) = DEG(/zg), part (i) gives LT(/zg) [ LT(/) and part (ii) gives 
LT(/7)LT(g) = LT(/7g);hence,LT(g)|LT(/). • 
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Exercises 

7.47 (i) Write the ft rst 10 monic monomials in k[x, y | in lexicographic order and 

in degree-lexicographic order. 

(ii) Write all the monic monomials in k[x, y, r] of total degree at most 2 in 
lexicographic order and in degree-lexicographic order. 

7.48 Give an example of a well-ordered set X that contains an element u for which 
[x € X: x < u] is infi nite. 

*7.49 Let < be a monomial order on N" , and let f(X), g(X) e k[X ] = k[x i , . . . , x„] be 
nonzero polynomials. Prove that if / + g 0. then 

DEG(/ + g) < max{DEG (/), DEG(g)}, 
and that strict inequality can occur only if DEG(/) = DEG(g). 


Generalized Division Algorithm 

We are now going to use monomial orders to give a division algorithm for poly- 
nomials in several variables. 

Definition. Let < be a monomial order on N" and let f(X), g(X) e k\ X \ = 
k[x i , . . . , x n ]. If there is a nonzero term cy in f(X) with LT(g) | cy X 1 ' 1 and 

crXP 

h(X) = f(X) - JL— g(X), 

LT (g) 

g 

then reduction f — > h is the replacement of / by h. 

Reduction is precisely the usual step involved in long division of polynomials 

g 

in one variable; if / — > h, then we have used g to eliminate a term from /, 
yielding h. Of course, a special case of reduction is when cyX^ = LT(/). 

Proposition 7.59. Let< be a monomial order on N", let f(X), g(X) e k[X ] = 
k[x i , . . . , x n ], and assume that f h; that is, there is a nonzero term cyX^ in 
f{X) with LT(g) | cyX? and h (X) = f{X) - g(X ). 

If p = DEG(/), i.e., ifcyXP = LT (/), then either 

h(X) = 0 or DEG(/7) -< DEG(/); 

if ft < DEG(/), then DEG(/t) = DEG(/). In either case, 

DEG 

(cF^ (Z) ) - DEG( - /f) - 
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Proof. Let us write 


f(X) = LT (/) + c k X k + lower terms; 

since cpX^ is a term of f(X), we have ft < DEG (/). If LT(g) = a y X y , so that 
DEG(g) = y, let us write 

g(X) = a y X y + aiX x + lower terms. 


Hence, 


crX? 

' ,<X) = /m -LJW S,X) 

= /(*>- o^[ LT « + « xl + " ] 


Now LT(g) | cpXP says that p — y e N" . We claim that 

DEG ( - \axX A + •••]) = k + p — y < p. 


X P 


The inequality holds, for k < y implies k + (P — y) -< y + (P — y) = p. 
To see that k + P — y is the Degree, it suffices to show that k + p — y = 

deg(— ltuo a ^ X ) i s th e largest multidegree occurring in — + •••]. 

But if a rl X ri is a lower term in g ( X), i.e., i) < k, then < being a monomial order 
gives 11 + (/3 — y) < k + (J3 — y), as desired. 

If h{X) 7 ^ 0, then Exercise 7.49 on page 564 gives 

deg(/z) < max {deg(/(X) - cpX p ), deg(- + •"])}• 

Now if 0 = DEG (/), then c^X? = LT (/), 


f{X) — cpX P = f{X) — LT (/) = c k X k + lower terms, 

and, hence, DEG (/(X) — cpX^) = k < DEG (/). Therefore, DEG(/z) -< DEG(/) 
in this case. If P -< DEG (/), then DEG(/(X) - cpXP) = DEG (/), while 

DEG(-^^[a ;i .X A - H ]) < p < DEG (/), and so DEG(/z) = DEG(/) in this 

case. 

The last inequality is clear, for 


cpX? 

LT (g) 


g(X) = cpX? + 


cpXP 

mg) 


[qa.X^ + ■■■]. 
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Since deg( — \ a \ X x + •••]) -< f , we see that 
, c p XP 

DEG ^ox^) g(X) ^ = & - DEG ^ } - • 

Definition. Let {gi, g m }, where gi = gi(X) e k[X], A polynomial r(X) is 
reduced mod {gi, . . . , g,„} if either r(X ) = 0 or no LT(g,) divides any nonzero 
term of r(X). 

Here is the division algorithm for polynomials in several variables. Because 
the algorithm requires the “divisor polynomials” {gi, . . . , g m } to be used in a 
specific order (after all, an algorithm must give explicit directions), we will be 
using an m-tuple of polynomials instead of a subset of polynomials. We use 
the notation [gi, . . . , g,„] for the m-tuple whose ith entry is gi, because the 
usual notation (gi, . . . , g m ) would be confused with the notation for the ideal 
(gi, • • • . gm) generated by the g,-. 


Theorem 7.60 (Division Algorithm in k[x i, . . . , ,r„]). Let < be a mono- 
mial order on N", and let k\X ] = k[x i, . . . , x„\. If f(X) e k[X ] and G = 
[gl(X), . . . , g m (X)] is an m-tuple of polynomials in k\ X |, then there is an algo- 
rithm giving polynomials r{X), oi(Z), . . . , a m (X ) e k\ X \ with 

f = aigl H \~a m g m + r, 

where r is reduced mod {gi, . . . , g ln ], and 

DEG (ajgi) < DEG (f)for all i. 

Proof. Once a monomial order is chosen, so that leading terms are defined, 
the algorithm is a straightforward generalization of the division algorithm in one 
variable. First, reduce mod gi as many times as possible, then reduce mod g 2 , 
then reduce mod gi again, etc. Here is a pseudocode describing the algorithm 
more precisely. 

Input: f(X) = Jfp cpXP, [gi , . . . , g m \ 

Output: r, «!, . . . , a m 
r := /; a t := 0 

WHILE / is not reduced mod {gi, . . . , g,„} DO 
select smallest i with LT(g ; ) | cpX@ for some f> 
f - [cpXP/LTlg, )] gi := f 
a t + [cpXP /\XY(gi) | := a, 

END WHILE 
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At each step hj % hj+l of the algorithm, we have multiword (/? ; ) >i ex 
multiword(/7y + i) in WIN' 1 ), by Example 7.56, and so the algorithm does stop, 
because ;<i ex is a well-order on W(N"). Obviously, the output r (X ) is reduced 
mod {gi, . . . , g,„}, for if it has a term divisible by some LT(g,), then one further 
reduction is possible. 

Finally, each term of a/(X) has the form cpX@ /LT(gj) for some intermedi- 
ate output h(X) (as one sees in the pseudocode). It now follows from Proposi- 
tion 7.59 that either a,g ( - = 0 or DEG (a jgj) -< DEG(/). • 


Definition. Given a monomial order on N' 1 , a polynomial f (X) e k\X\, and 
an m-tuple G = [gi, . . . , g m ], we call the output r(X) of the division algorithm 
the remainder of f mod G. 

Note that the remainder r of f mod G is reduced mod {gi, . . . , g m ] and 
f — r e i = (g \, ..., g m ). The algorithm requires that G be an m-tuple, because 
of the command 

select smallest i with LT(g,) | cpX^ for some /; 

specifying the order of reductions. The next example shows that the remainder 
may depend not only on the set of polynomials {gi, . . . , g m } but also on the 
ordering of the coordinates in the m -tuple G = [gi, . . . , g m ]. That is, if a e 
S m is a permutation and G a = [g<r(i), • • • , ga(m)L then the remainder r a of / 
mod G a may not be the same as the remainder r of / mod G. Even worse, 
it is possible that r / 0 and r a = 0, so that the remainder mod G is not the 
obstruction to / being in the ideal (g i , . . . , g m ). 


Example 7.61. 

Let f(x, y, z) = x 2 y 2 + xy, and let G = [gi, g 2 - A'd- where 

81 = y 2 + z 2 

82 = X 2 y + yz 

83 = z 3 + xy. 

We use the degree-lexicographic order on N 3 . Now y 2 = LT(gi) [ LT (/) = 
x 2 y 2 , and so / h, where h = f — ^-(y 2 + z 2 ) = —x 2 z 2 + xy. 

The polynomial —x 2 z 2 + xy is reduced mod G, because neither —x 2 z 2 nor 
xy is divisible by any of the leading terms LT(gi) = y 2 , LT(g 2 ) = x 2 y, or 
LT(g 3 ) = z 3 . 
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On the other hand, let us apply the division algorithm using the 3-tuple G' = 

g2 

[ g 2 , g i, g 3 ]. The first reduction gives / — > h\ where 

2 2 

ti = f - ^— ( x 2 y + yz) = ~y 2 z + xy. 
x z y 

Now h' is not reduced, and reducing mod g \ gives 

h' - 7 ^(y 2 + z 2 ) = z 3 + xy. 

y 

But z 3 + xy = g 3 , and so z 3 + xy — > 0. 

Thus, the remainder depends on the ordering of the divisor polynomials gj 
in the m-tuple. 

For a simpler example of different remainders (but with neither remainder 
being 0), see Exercise 7.50. ◄ 

The dependence of the remainder on the order of the gj in the m-tuple G = 
[gl, . . . , g m ] will be treated in the next subsection. 


Exercises 

*7.50 Let G = [x — y, x — z] and G' = [x — z, x — y \ . Show that the remainder of x 
mod G (degree-lexicographic order) is distinct from the remainder of x mod G' . 

7.51 Use the degree-lexicographic order in this exercise. 

(i) Find the remainder of x 7 y 2 + x 3 y 2 — y + 1 mod [xy 2 — x, x — y 3 ]. 

(ii) Find the remainder of x 7 y 2 + x 3 y 2 — y + 1 mod [x — y 3 , xy 2 — x]. 

7.52 Use the degree-lexicographic order in this exercise. 

(i) Find the remainder of x 2 y + xy 2 + y 2 mod [y 2 — 1, xy — 1]. 

(ii) Find the remainder of x 2 y + xy 2 + y 2 mod [xy — 1, y 2 — 1], 

*7.53 Let c a X a be a nonzero monomial, and let f(X),g(X) e k[X \ be polynomials 
none of whose terms is divisible by c a X a . Prove that none of the terms of f(X) — 
g(X) is divisible by c a X a . 

*7.54 An ideal I in k\X\ is a monomial ideal if it is generated by monomials: I = 
(X“® , . . . , X a W). 

(i) Prove that f(X ) e I if and only if each term of f(X) is divisible by some 
X a (‘ ) . 

(ii) Prove that if G = [gi, . . . , g,„ | and r is reduced mod G, then r does not 
lie in the monomial ideal (LT(gi), . . . , LT(g m )). 
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Grobner Bases 

For the remainder of this section we will assume that N" is equipped with some 
monomial order (the reader may use the degree-lexicographic order), so that 
LT (/) is defined and the division algorithm makes sense. 

We have seen that the remainder of / mod [gi, . . . , g m ] obtained from the 
division algorithm can depend on the order in which the g,- are listed. A Grobner 
basis {gi, . . . , g m } of the ideal I = (gi, . . . , g m ) is a basis such that, for any 
of the m -tuples G formed from the g,-, the remainder of / mod G is always the 
obstruction to whether / lies in /; this will be a consequence of the definition 
(which is given to make make sure Grobner bases are sets and not m-tuples). 


Definition. A set of polynomials {gi , . . . , g,„ } is a Grobner basis 9 of the ideal 
I = (gi, , g m ) if, for each nonzero / e I, LT(g,) | LT (/) for some g,-. 

Example 7.61 shows that 

{ y 2 + z 2 , x 2 y + yz, z 3 + xy] 

is not a Grobner basis of the ideal I = (y 2 + z 2 , x 2 y + yz, z 3 + xy). 


Proposition 7.62. A set {gi, . . . , g m } of polynomials is a Grobner basis of 

1 = (gi g m ) if and only if for each m-tuple G a = [go- (1), • • • , ga(m)] 

(where a € S m ), every f € I has remainder 0 mod G a - 

Proof. Assume there is some permutation a € S m and some f € I whose 
remainder mod G a is not 0. Among all such polynomials, choose / of minimal 
Degree. Since {gi, . . . , g„,} is a Grobner basis, LT(g,) | LT(/) for some/; select 

the smallest o(i) for which there is a reduction / <i —> ) h, and note that h e I. 
Since DEG(/z) -< DEG(/), by Proposition 7.59, the division algorithm gives a 
sequence of reductions /; = Iiq — > h\ — > I 12 — > • • • — > h p = 0. But the division 
algorithm for / adjoins / — > h at the front, showing that 0 is the remainder of 
/ mod G a , a contradiction. 

Conversely, let [gi, . . . , g„,} be a Grobner basis of / = (gi, . . . , g m ). If 
there is a nonzero / e / with LT(g,) j LT (/) for every i, then in any reduction 

/ -5- h, we have LT(/;) = LT(/). Hence, if G = [gi, . . . , g m ], the division 
algorithm mod G gives reductions / — > h \ — > I 12 — > • • • — > h p = r in which 
LT(r) = LT(/). Therefore, r f 0: that is, the remainder of / mod G is not 
zero, and this is a contradiction. • 

9 It was B. Buchberger who, in his dissertation, proved the main properties of Gr "obner 
bases. He named these bases to honor his thesis advisor, W. Gr obner. 
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Corollary 7.63. Let I = (gi, . . . , g m ) be an ideal, let {gi, . . . , g„,} be a 
Grobner basis of I, and let G = [gi, . . . , g,„] be any m -tuple formed from the 
Si- If f(X) e k\X\, then there is a unique r(X ) e k[X\ which is reduced 
mod {gi, . . . , g ln }, such that f — r e I; in fact, r is the remainder of f mod G. 

Proof The division algorithm gives a polynomial r which is reduced mod 

{gl, . . • , gm}, and polynomials a x ,...,a m with / = aigi H f- a m g m + r; 

clearly, / - r = a\g\ 4 h a m g m e I. 

To prove uniqueness, suppose that r and r' are reduced mod {gi , . . . , g m } 
and that f — r and f — r' lie in /, so that (/ — r') — (/ — r) = r — r' e I. 
Since r and r' are reduced mod {gi, . . . , g,„ }, none of their terms is divisible by 
any LT(g,). If r — r' 0, then Exercise 7.53 on page 568 says that no term of 
r — r' is divisible by any LT(g,); in particular, LT(r — r') is not divisible by any 
LT (gi), and this contradicts Proposition 7.62. Therefore, r = r' . • 

The next corollary shows that Grobner bases resolve the problem of different 
remainders in the division algorithm arising from different m -tuples. 

Corollary 7.64. Let I = (gi, . . . , g m ) be an ideal, let {gi, . . . , g,„} be a 
Grobner basis of I, and let G be the m-tuple G = [gi, . . . , g m |. 

(i) If f(X) e k[X ] and G a = [gcr(i), . . . , ga(m)l where cr e S m is a per- 
mutation, then the remainder of f mod G is equal to the remainder of f 
mod Ga- 
in) A polynomial f e 1 if and only if f has remainder 0 mod G. 

Proof 

(i) If r is the remainder of / mod G, then Corollary 7.63 says that r is the unique 
polynomial, reduced mod {gi, . . . , g m ), with / — re/; similarly, the remainder 
r a of / mod G„ is the unique polynomial, reduced mod {gi, . . . , g„,}, with / — 
r a e I- The uniqueness assertion in Corollary 7.63 gives r = r a . 

(ii) Proposition 7.62 shows that if / e /, then its remainder is 0. For the con- 
verse, if r is the remainder of / mod G, then / = q + r, where q e I. Hence, 
if r = 0, then / e I. • 

There are several obvious questions. Do Grobner bases exist and, if they do, 
are they unique? Given an ideal I in k[X], is there an algorithm to find a Grobner 
basis of /? 

The notion of S-polynomial will allow us to recognize a Grobner basis, but 
we first introduce some notation. 

Definition. If a = (a \ , . . . , a „ ) and f = (fi \, . . . , f n ) are in N' 1 , define 


a V f = p., 
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where /x = (/x i, . . . , /x„) is given by /x, = max{o!,-, /!,}. 

It is easy to see that X a / 1' 1 is the least common multiple of the monomials 
X a and X@. 


Definition. Let f(X),g(X) e k[X], where LT (/) = a a X a and U(g) = 
bpXP. Define 

L(f, g) = X aV P . 


The S-polynomial S(f, g ) is defined by 


S(f, g) = 


Uf,g ) 
LT (/) 


Uf,g) 
LT (g) 8 


= a~ l X (aW ^- a f{X) - b/x^V-PgiX). 


Note that S(f, g) = - S(g , /). 


Example 7.65. 

We show that if / = X a and g = X^ are monomials, then 5 (/, g) = 0. Since 
/ and g are monomials, we have LT (/) = / and LT(g) = g. Hence, 


S(f, g) = 


L(J\g) 
LT (/) 


L(/,g) 

LT(g) 8 f 


X aW P 

g = 0. ◄ 

g 


The following technical lemma indicates why 5-polynomials are relevant. 


Lemma 7.66. Given gi(X), . . . , gi(X) e k[X ] and monomials CjX al 'j\ let 

HX) = EU cjX^gjiX). 

Let 8 be a multidegree. IfDEG(h) -< S and DEG(cj X a( ^ gj (X )) = 8 for all 
j < t, then there are dj e k with 

h(X) = J2djX S - flU) S(g j ,gj +1 ), 

j 

where / i(j ) = DEG(gy) V DEG(gy+i), and for all j < l, 

DEG (X S -^S( gj ,g j+1 )) ^8. 


Remark. The lemma says that if DEG( JT ajgj) < 8, where the a j are mono- 
mials, while DEG (cijgj) = 8 for all j, then h can be rewritten as a linear com- 
bination of 5-polynomials, with monomial coefficents, each of whose terms has 
multidegree strictly less than 8. A 
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Proof. Let LT (gj) = bjX^\ so that LT (cjX a Wgj(X)) = cjbjX s . The 
coefficient of X s in h(X) is thus c jbj- Since DEG(/z) -< <5, we must have 
j cjbj = 0. Define monic polynomials 

u j (X) = bj 1 X a ^gj(X). 

There is a telescoping sum 

l 

h(X) = J2cjX aU) gj(X) 

7= l 

e 

= J2 c J b j u j 

f=i 

= c\b\ (u i — u 2 ) + (c\bi + c 2 b 2 )(u 2 — uf) H 

+ {c\b\ H h ci-ibi-i - ui) 

+ (c\b\ H f- C(bt)ut. 

Now the last term {c\b\ + • • • + C(bi)ui = 0 because . C jbj = 0. Since 
DEG (cjX^gjiX)) = 8, we have a( j) + f(j) = 8, so that X^ \ X s for all 
j. Hence, for all j < l, we have lcm{X^ (;) , X^ (;+1) } = X^ ( ^ v ^ ( - 7+1) | X s ; that 
is, if we write n(j) = f(j) V f(j + 1 ), then 8 - ijl ( j ) e N" . But 

x‘-^s (Si . gl+l) = - £;«„») 

= mr > s,(X) ~ lt ( W+I ) sj+,ixi 

= b?X°<l>gj-bjl l X°U+% +I 
= M 7 ii 7+1 ■ 

Substituting this equation into the telescoping sum gives a sum of the desired 
form, where dj = c\b\ + • • • + cjbf. 

h{X) = c l b l X s -^ 1) S(g 1 ,g 2 ) + (ci bi + c 2 b 2 )X s -^ 2) S(g 2 , «) + ••• 

+ (dfei + • • • + q_i bi-i)X s -^ l - l) Sig e -ugt). 

Finally, since both uj and u j+\ are monic with leading term of multide- 
gree 5, we have DEG (uj — uj + 1 ) -< 8. But we have shown that uj — uj+\ = 
X S ~^U) S(gj, gj+ 1 ), and so DEG(X s ~^^S(gj, gj+i) -< 8, as desired. • 

Let I = (gi, g m ). By Proposition 7.62, {gi, . . . , g m } is a Grobner ba- 
sis of the ideal I if every / e / has remainder 0 mod G (where G is any m- 
tuple formed by ordering the gi). The importance of the next theorem lies in 
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its showing that it is necessary to compute the remainders of only finitely many 
polynomials, namely, the 5-polynomials, to determine whether {gi, . . . , g,„} is 
a Grobner basis. 

Theorem 7.67 (Buchberger). A set [gi, . . . , g m ] is a Grobner basis of 1 = 
(gl, . . . , gm) if and only if S{g p , g q ) has remainder 0 mod G for all p, q, where 
G = [gi, . . . , gm\. 

Proof Clearly, S(g p , g q ). being a linear combination of g p and g q . lies in I. 
Hence, if G = {gi , . . . , g m 1 is a Grobner basis, then 5 (g p , g q ) has remainder 0 
mod G, by Proposition 7.62. 

Conversely, assume that S(g p . g q ) has remainder 0 mod G for all p, q\ we 
must show that every / e 1 has remainder 0 mod G. By Proposition 7.62, it 
suffices to show that if / € /, then LT(g,) | LT(/) for some i. Since f e I = 
(gi, . . . , g m ), we may write / = J) i hjgj, and so 

DEG (/) < max{DEG(/?,g;)}. 
i 

If there is equality, then DEG(/) = DEG (higi) for some i, and so Proposi- 
tion 7.58 gives LT(g,) [ LT(/), as desired. Therefore, we may assume strict 
inequality: DEG (/) -< max, {DEG(/7,g,)}. 

The polynomial / may be written as a linear combination of the g,- in many 
ways. Of all the expressions of the form / = JT hjgi, choose one in which 
S = max, {DEG(/7,g,)} is minimal (which is possible because ^ is a well-order). 
If DEG(/) = 8, we are done, as we have seen above; therefore, we may assume 
that there is strict inequality: DEG(/) -< 8. Write 

f= J2 h i g j + J2 ,ngi - (1) 

j « 

DEG (hjgj)=b DEG (h e g t )<8 

Now if DEG QZjhjgj) = 8, then DEG(/) = 8, a contradiction. Therefore, 
DEG( hjgj) -< S. But the coefficient of X s in this sum is obtained from its 
leading terms, so that 

DEG (Jjmhj)g,) < s. 
j 

Now ff j LT (hj)gj is a polynomial satisfying the hypotheses of Lemma 7.66, 
and so there are constants dj and multidegrees /x ( j ) so that 

E ^ h j)gj = E djX s -^S( gj ,g j+1 ), (2) 

j j 
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where DEG^Z 5 gj+i)j < 5. 10 

Since each S ( gj , gj+i) has remainder 0 mod G , the division algorithm gives 
aji(X) e k[X] with 


S(gj,gj+ 1 ) = J2 a J i§i ' 

i 


where DEG (a//#;) ;< DEG (S(gj, gj+i)) for all j, i. It follows that 
X S ~^U) S( gj , gj+1 ) = J2 X S -^ j) a jigi . 

i 

Therefore, Lemma 7.66 gives 

DEG (X S -^aji) < DEG (X s -^ j) S(gj, gj+i)) < S. (3) 

Substituting into Eq. (2), we have 

E LT (hj)gj = E djX s -^S(gj,gj +l ) 
j j 

= Y. d j(Hx s - lMj> «jigi) 

j i 

= E(E djX'-^ajtei. 

i j 

If we denote djX s ~ /l( ^aji by then 

ELT(/. y )^ = E%. (4) 

j i 

where, by Eq. (3), 

DEG(/7 jgi) < S for all i. 


10 The reader may wonder why we consider all 5-polynomials S(g p , gq) instead of only 
those of the form S(gj, g,'+i ). The answer is that the remainder condition is applied only to 
those h jg j for which DEG (hjgj) = <5, and so the indices viewed as /’ s need not be consecu- 
tive. 
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Finally, we substitute the expression in Eq. (4) into Eq. (1): 

f = E h iSj + E hf -Si 

j t 

DEG(/i j g j)=S DEG (htgt)-<& 

= E LT(h .i^j + E \.hj-m h j)] g j + e**» 

j j i 

DEG(hjgj)=S DEG(hjgj)=S DEG(hegl)<8 

= E h 'iSi + E [hj -LT(hj)]gj + E h i8t- 

i j l 

DEG (hjgj)=8 E>EG(heg{)<8 

We have rewritten / as a linear combination of the g, in which each term has 
multidegree strictly smaller than 5, contradicting the minimality of S. This com- 
pletes the proof. • 

Corollary 7.68. If I = (f \ , . . . , f s ) in k\ X |, where each fj is a monomial ( that 
is, if I is a monomial ideal), then {f\, , f s } is a Grobner basis of I . 

Proof. By Example 7.65, the 5-polynomial of any pair of monomials is 0. • 

Here is the main result. 

Theorem 7.69 (Buchberger’s Algorithm). Every ideal I = (/i, . . . , f s ) in 

k\ X | has a Grobner basis 11 which can be computed by an algorithm. 

Proof. Here is a pseudocode for an algorithm. 

Input: B = {/!,...,/,} G = [/i,...,/ s ] 

Output: a Grobner basis B = {gi, . . . , g m } 
containing {/, ,...,/)} 

B G :=[/i,..., /,] 

REPEAT 

B' ■.= B\ G' ■.= G 

FOR each pair g, g' with g g' e B' DO 
r := remainder of S(g, g') mod G' 

IF r ^ 0 THEN 

B:=BU{r}-, G' := [gi, ... , g m , r] 

END IF 
END FOR 
UNTIL B = B' 

1 1 A nonconstructive proof of the existence of a Gr'obner basis can be given using the proof 
of the Hilbert basis theorem; for example, see Section 2.5 of the book of Cox, Little, and 
O’Shea (they give a constructive proof in Section 2.7). 
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Now each loop of the algorithm enlarges a subset B C I = (g\, ... , g m ) 
by adjoining the remainder mod G of one of its 5-polynomials S{g,g '). As 
g, g' e /, the remainder r of S(g , g') lies in 7, and so the larger set B U {r} is 
contained in I. 

The only obstruction to the algorithm stopping at some B' is if some 5 (g, g ') 
does not have remainder 0 mod G' . Thus, if the algorithm stops, then Theo- 
rem 7.67 shows that B' is a Grobner basis. 

To see that the algorithm does stop, suppose a loop starts with B' and ends 
with B. Since B' c B, we have an inclusion of monomial ideals 

(LT(gO:s'eB0c(LT(g):geB). 

We claim that if B' C B, then there is also a strict inclusion of ideals. Suppose 
that r is a (nonzero) remainder of some 5-polynomial mod B ' , and that B = 
B' U (r }. By definition, the remainder r is reduced mod G' , and so no term of r 
is divisible by LT(g') for any g' e B'\ in particular, LT(r) is not divisible by any 
LT(g')- Hence, LT(r) ^ (LT(g'): g' £ B'), by Exercise 7.54 on page 568. On 
the other hand, we do have LT(r) e (LT(g ) : g e B). Therefore, if the algorithm 
does not stop, there is an infinite strictly ascending chain of ideals in k[X], and 
this contradicts the Hilbert basis theorem, for k[X ] has the ACC. • 


Example 7.70. 

The reader may show that B' = { y 2 + z 2 . x 2 y + yz, z 3 + xy] is not a Grobner 
basis because S(_y 2 + z 2 , x 2 y + yz) = x 2 z 2 — y 2 z does not have remainder 0 
mod G 1 . However, adjoining x 2 z 2 — y 2 z does give a Grobner basis B because 
all 5-polynomials in B have remainder 0 mod B’ . ◄ 

Theoretically, Buchberger’s algorithm computes a Grobner basis, but the 
question arises how practical it is. In very many cases, it does compute in a 
reasonable amount of time; on the other hand, there are examples in which it 
takes a very long time to produce its output. The efficiency of Buchberger’s 
algorithm is discussed in Section 2.9 of the book by Cox, Little, and O’Shea. 


Corollary 7.71. 

(i) If I = (/i, ft) is an ideal in k\ X |, then there is an algorithm to deter- 
mine whether a polynomial h(X) e k\ X | lies in I. 

(ii) If I = (f\, . . . , ft) and I' = (f[, . . . , f') are ideals in k[X ], then there is 
an algorithm to determine whether 1 = I 1 . 


Proof 
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(i) Use Buchberger’s algorithm to find a Grobner basis B of /, and then use the 
division algorithm to compute the remainder of h mod G (where G is any m- 
tuple arising from ordering the polynomials in B). By Corollary 7.64(h), h e I 
if and only if r = 0. 

(ii) Use Buchberger’s algorithm to find Grobner bases {gi,...,g m } and 

{g^, . . . , g' p ) of / and I', respectively. By part (i), there is an algorithm to de- 
termine whether each g'. e I, and /' C / if each g'- e I. Similarly, there is 
an algorithm to determine the reverse inclusion, and so there is an algorithm to 
determine whether 1 = 1'. • 

One must be careful here. Corollary 7.71 does not begin by saying “If I is 
an ideal in k[X\G instead, it specifies a basis: I = (/i, . . . , f t ). The reason, 
of course, is that Buchberger’s algorithm requires a basis as input. For exam- 
ple, if 7 = (h\, , h s ), then the algorithm cannot be used to check whether a 

polynomial f(X ) lies in the radical s/1 unless one has a basis of \/7. (There 
do exist algorithms giving bases of /(fi , . . . , ft)’, see the book by Becker and 
Weispfenning.) 

A Grobner basis B = {gi , . . . , g m ) can be too large. For example, it follows 
from Proposition 7.62 that if f e I, then B U {/} is also a Grobner basis of /; 
thus, one may seek Grobner bases that are, in some sense, minimal. 

Definition. A basis {gi, . . . , g m } of an ideal I is reduced if 

(i) each g ; - is monic; 

(ii) each g, is reduced mod {gi, . . . , gi , . . . , g„,}. 

Exercise 7.61 on page 580 gives an algorithm for computing a reduced basis 
for every ideal (/i ,...,//) • When combined with the algorithm in Exercise 7.63 
on page 580, it shrinks a Grobner basis to a reduced Grobner basis. It can be 
proved that a reduced Grobner basis of an ideal is unique. 

In the special case when each /,■ (X) is linear, that is, 


fi(X) = a n xi H \-a in x n . 


then the common zeros Var(/i, ...,/)) are the solutions of a homogeneous sys- 
tem of t equations in n unknowns. If A = [u,y | is the t x n matrix of coefficients, 
then it can be shown that the reduced Grobner basis corresponds to the row re- 
duced echelon form for the matrix A (see Section 10.5 in the book of Becker and 
Weispfenning). 

Another special case occurs when f\, ... , f, are polynomials in one vari- 
able. The reduced Grobner basis obtained from {/i, . . . , ft } turns out to be their 
gcd, and so the euclidean algorithm has been generalized to polynomials in sev- 
eral variables. 
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We end this chapter by showing how to find a basis of an intersection of 
ideals. Given a system of polynomial equations in several variables, one way 
to find solutions is to eliminate variables (van der Waerden, Modern Algebra II, 
Chapter XI). Given an ideal / C k[ X \. we are led to an ideal in a subset of 
the indeterminates, which is essentially the intersection of Var(7) with a lower- 
dimensional plane. 

Definition. Let k be a field and let / c k[X, Y | be an ideal, where k[X, Y | is 
the polynomial ring in disjoint sets of variables XU Y. The elimination ideal is 

Ix = ink[X]. 


For example, if I = (x 2 ,x_y), then a Grobner basis is {x 2 ,xy} (they are 
monomials, so that Corollary 7.68 applies), and I x = (x 2 ) C fc[x], while I y = 
{0}. 

Proposition 7.72. Let k be afield and let k\ X | = k[x i, . . . , x„] have a mono- 
mial order for which x\ > X 2 > ■ ■ ■ > x„ (for example, the lexicographic order) 
and, for fixed p > 1, let Y = x p , , x n . If I C k\ X \ has a Grobner basis 
G = (gi, . . . , g m }, then G Cl Iy is a Grobner basis for the elimination ideal 
Iy = I Cl k[x p , . . . , x„\. 

Proof. Recall that {gi, . . . , g m } being a Grobner basis of / = (g\, . . . , g m ) 
means that for each nonzero f e I, there is gj with LT(g,) [ LT(/). Let 
f(x p ,...,x n ) e Iy be nonzero. Since Iy c /, there is some gi(X) with 
LT (gj) [ LT (/); hence, LT(g, ) involves only the “later” variables x p , . . . , x n . 
Let DEG(LT(g,)) = If gj has a term c a X a involving “early” variables x,- with 
i < p, then a >- /3, because xi > • • • > x p > • • • > x„. This is a contradiction, 
for f>. the degree of the leading term of gj, is greater than the degree of any other 
term of gj. It follows that gj e k[x p , . . . , x„]. Exercise 7.60 on page 580 now 
shows that G H k[x p , . . . , x n \ is a Grobner basis for Iy = I D k[x p , . . . , x„]. • 

We can now give Grobner bases of intersections of ideals. 

Proposition 7.73. Let k be a field, and let 1\, . . . , l t be ideals in k\ X |, where 
X = xi ,x„. 

(i) Consider the polynomial ring k[X, yi , . . . , y r ] in n + t indeterminates. If 
J is the ideal in k[X, yi , . . . , »] generated by 1 — (yi + • • • + yf) and Vjlj, 
for all j, then fj/Li h = Jx- 

(ii) Given Grobner bases of I\, ... , l t , a Grobner basis of fj|=i Ij can be 
computed. 



Grobner Bases 579 


Proof. 

(i) If f = f(X ) e Jx = J D k[X ], then f £ J , and so there is an equation 

f(X) = g(X, Y)( 1 - £ yj) + yu • • • ■ Vt)yjqj(X), 

j 

where g, hj e k[X, Y ] and qj e Ij. Setting yj = 1 and the other y’s equal to 
0 gives / = hj(X, 0 , . . 1, . . . , 0 )qj(X). Note that h j(X, 0 , . . . , 1, . . . , 0 ) e 
k\X\. and so / e Ij. As j was arbitrary, we have / e f]Ij, and so Jx c P) Ij. 
For the reverse inclusion, if / e P) Ij , then the equation 


f = /(! - J2 yj ) + J2 yjf 

j 

shows that / e Jx, as desired. 

(ii) This follows from part (i) and Proposition 7.72 if we use a monomial order 
in which all the variables in X precede the variables in Y . • 

Example 7.74. 

Consider the ideal I = (x) Cl (x 2 , xy, y 2 ) C k\x. y], where k is a field. Even 
though it is not difficult to find a basis of I by hand, we shall use Grobner bases 
to illustrate Proposition 7.73. Let u and v be new variables, and define 

J = (1 — u — v, ux, vx 2 , vxy, vy 2 ) C k[x, y, u, u]. 

The first step is to find a Grobner basis of J ; we use the lexicographic monomial 
order with x < y < u < v. Since the 5-polynomial of two monomials is 0, 
Buchberger’s algorithm quickly gives a Grobner basis 12 G of J : 

G = {v + u — 1, x 2 , yx, ux, uy 2 — y 2 }. 

It follows from Proposition 7.72 that a Grobner basis of I is G Cl k[x, y ]: all 
those elements of G that do not involve the variables u and v. Thus, 

I = (x) Cl (x 2 , xy, y 2 ) = (x 2 , xy). ◄ 


Exercises 

Use the degree-lexicographic monomial order in the following exercises. 

7.55 Let I = (y — x 2 , z — x 3 ). 

(i) Order x < y < z, and let ^i cx be the corresponding monomial order on 
N 3 . Prove that | y — x 2 , z — x 3 ] is not a Gr'obner basis of I. 

l2 This is actually the reduced Gr'obner basis given by Exercise 7.63 on page 580. 
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(ii) Order y < z < x, and let ^| ex be the corresponding monomial order on 
N 3 . Prove that | v — x 2 , z — x 2 ] is a Gr'obner basis of 7. 

7.56 Find a Gr'obner basis of 7 = (j? — 1, xy 2 — x). 

7.57 Find a Gr'obner basis of 7 = (jc + y, x 4 + 2x 2 y + y 2 + 3). 

7.58 Find a Gr'obner basis of 7 = (xz, xy — z, yz — *). Does x 1 + x + 1 lie in /? 

7.59 Find a Gr'obner basis of 7 = (jt — y, y 2 — x, x 2 y 2 — xy). Does x 4 + x + 1 lie in 

7? 

*7.60 Let 7 be an ideal in k[X ], where k is a field and k[X ] has a monomial order. 

Prove that if a set of polynomials {gi g m } C 7 has the property that, for each 

nonzero / e 7, there is some g, with LT(g,) | LT(/), then 7 = (gi, . . . , g m ). 
Conclude, in the defi nition of Gr'obner basis, that one need not assume that 7 is 
generated by gi , 

*7.61 Show that the following pseudocode gives a reduced basis Q of an ideal 7 = 

</i ft). 

Input: P = [/i, . .. , f t ] 

Output: Q = [q i, ...,q s ] 

Q '■= P 

WHILE there is q e Q which is 

not reduced mod Q — {q } DO 
select q G Q which is not reduced mod Q — {q} 

Q ■= Q ~ {q } 

h := the remainder of q mod Q 
IF h / 0 THEN 
Q:=QU{h} 

END IF 
END WHILE 
make all q e Q monic 

If G is a Gr'obner basis of an ideal 7, and if Q is the basis of 7 obtained from the 
algorithm in Exercise 7.61, prove that Q is also a Gr'obner basis of 7. 

Show that the following pseudocode replaces a Gr'obner basis G with a reduced 
Gr "obner basis 77. 

Input: G = {g i g m } 

Output: 77 
77 := 0; F := G 
WHILE F/0DO 
select f' from F 
F:=F- {/'} 

IF LT(/) \ LT (/') for all f € F AND 
LT (h) | LT(/') for all h e 77 THEN 
77 := 77 U {/'} 

END IF 
END WHILE 

apply the algorithm in Exercise 7.61 to 77 


7.62 

*7.63 
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Inequalities 


We now prove some elementary properties about inequalities of real num- 
bers, and we begin by recording some properties of the set P of all positive real 
numbers. 

(i) P is closed under addition and multiplication; that is, if a, b are in P, then 
a + b is in P and ab is in P . 

(ii) There is a trichotomy : if a is any real number, then exactly one of the 
following is true: 


a is in P; a = 0; —a is in P. 

Definition. For any two real numbers a and A, define a < A (also written 
A > a) to mean that A — a is in P. We write a < A to mean either a < A or 
a = A. 

Proposition A.l. For all a, b, c el, 

(i) a < a\ 

(ii) if a < b and b < c, then a < c; 

(iii) if a < b and b < a, then a = b\ 

(iv) either a < b, a = b, or b < a. 

Proof 

(i) We have a < a because a — a = 0. 

(ii) If a < b, then b — a is in P or b = a; if b < c, then c — b is in P or c = b. 
There are four cases. If b—a is in P andc— fiis in P, then (b— o)+(c— b) = c—a 
is in P and a < c. If b — a is in P and c = b, then c — a is in P and a < c. The 
other two cases are similar. 
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(iii) Assume that a < b and b < a. As in part (i), there are four easy cases. For 
example, if b — a is in P and a — b is in P, then (b — a) + (a — b) = 0 is in 
P, a contradiction, so this case cannot occur. If b — a is in P and b = a, then 
b — a = b — b = 0 is in P, another contradiction; similarly, a — b is in P and 
a = b cannot occur. The only remaining possibility is a = b. 

(iv) Either a — b is in P, a — b = 0, or — (a — b) = b — a is in P; that is, either 
b < a, a = b, or a < b. • 

Notice that if a < b and b < c, then a < c [for c — a = (c — b) + (b — a) is 
a sum of two numbers in P and, hence, lies in P], One often abbreviates these 
two inequalities as a < b < c. The reader may check that if a < b < c, then 
a < c, with a < c if either inequality a < b or b < c is strict. 

Proposition A.2. Assume that b and B are real numbers with b < B. 

(i) Ifm is positive, then mb < mB ; ifm is negative, then mb > mB. 

(ii) For any number N, positive, negative, or zero, we have 

N + b<N + B and N -b > N - B. 

(iii) Let a and A be positive numbers. If a < A, then 1 /a > 1/A, and, con- 
versely, if 1 /A < 1/a, then A > a. 

Proof (i) By hypothesis, B — b > 0. If m > 0, then the product of positive 
numbers being positive implies that m(B — b) = mB — tub is positive; that is, 
mb < mB. If m < 0, then the product m(B — b) = mB — mb is negative; that 
is, mB < mb. 

(ii) The difference (N+B) — (N+b) is positive, for it equals B — b. For the other 
inequality, (N — b) — (N — B) = — b + B is positive, and, hence, N — b > N — B. 

(iii) If a A, then A a is positive. Hence, 1 J a 1 / A — (A cC) j Aa is positive, 
being the product of the positive numbers A — a and 1 /An (by hypothesis, both 
A and a are positive). Therefore, 1/a > 1/A. Conversely, if 1/A < 1/a, then 
part (i) gives a = Ao(l/A) < Aa(l/a) = A; that is, A > a. • 

For example, since 2 < 3, we have —3 < —2 and ^ < j. One should always 
look at several particular cases of a formula (even though the validity of these 
few cases does not prove the truth of the formula), for it helps one have a better 
feeling about what is being asserted. 




Pseudocodes 


An algorithm solving a problem is a set of directions which gives the cor- 
rect answer after a finite number of steps, never at any stage leaving the user in 
doubt as to what to do next. The division algorithm is an algorithm in this sense: 
one starts with a and b and ends with q and r. We are now going to treat algo- 
rithms more formally, using pseudocodes, which are general directions that can 
easily be translated into a programming language. The basic building blocks of 
a pseudocode are assignments, looping structures, and branching structures. 

An assignment is an instruction written in the form 

(variable) := (expression). 

This instruction evaluates the expression on the right, using any stored values 
for the variables appearing in it; this value is then stored on the left. Thus, the 
assignment replaces the variable on the left by the new value on the right. 


Example B.l. 

Consider the following pseudocode for the division algorithm. 

1: Input: b > a > 0 
2: Output: q, r 
3: q := 0; r := b 
4: WHILE r > a DO 
5 : r := r — a 

6: q := q + 1 

7: END WHILE 

The meaning of the first two lines is clear; line 3 has two assignments giving 
initial values to the variables q and r. Let us explain the looping structure 
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WHILE . . . DO before considering assignments 5 and 6. The general form is 

WHILE (condition) DO 
(action). 

Here, action means a sequence of instructions. The loop repeats the action as 
long as the condition holds, but it stops either when the condition is no longer 
valid or when it is told to end. In the example above, one begins with r = b and 
q = 0; since b > a, the condition holds, and so assignment 5 replaces r = b by 
r = b — a. Similarly, assignment 6 replaces q = Qby q = \.\f r = b — a > 0, 
this loop repeats this action using the new values of r and q just obtained. 

This pseudocode is not a substitute for a proof of the existence of a quotient 
and a remainder. Had we begun with it, we would still have been obliged to prove 
two things: first, that the loop does stop eventually; second, that the outputs q 
and r satisfy the desiderata of the division algorithm, namely, b = qa + r and 
0 <r < a. ◄ 

Example B.2. 

Another popular looping structure is REPEAT, written 

REPEAT (action) UNTIL (condition). 

In WHILE, the condition tells when to proceed, whereas in REPEAT, the condi- 
tion tells when to stop. Another difference is that WHILE may not do a single 
step, for the condition is checked before acting; REPEAT always does at least 
one step, for it checks that the condition holds only after it acts. 

For example, consider Newton’s method for finding a real root of a poly- 
nomial f{x). Recall that one begins with a guess x\ for a root of fix) and, 
inductively, defines 

f(x n ) 

x ‘*'~ x " Tm’ 

If the sequence \x n ) converges (and it may not), then its limit is a root of fix). 
The following pseudocode finds a real root of fix) = x 3 + x 2 — 36 with error 
at most .0001. 

Input: x 
Output: x, y, y’ 

REPEAT 

y := x 3 + x 2 - 36 
y’ := 3x 2 + 2x 
x := x — y/y' 

UNTIL y < .0001 ◄ 
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Example B.3. 

Here is an example of the repetition structure FOR, written 
FOR each kin K DO (action). 

Here, a (finite) set K = {k\, , k„ } is given, and the action consists in perform- 
ing the action on k\ , then on kj, through k n . 

For example, 

FOR each n with 0 < n <41 DO 
/ := n 2 — n + 41 
END FOR ◄ 


Example B.4. 

An example of a branching structure is 

IF (condition) THEN (action #1) ELSE (action #2). 

When this structure is reached and the condition holds, then action #1 is taken 
(only once), but if the condition does not hold when this structure is reached, 
then action #2 is taken (only once). One can omit ELSE (action #2), in which 
case the directions are 

IF (condition) THEN (action #1) ELSE do nothing. ◄ 

Here is a pseudocode implementing the Euclidean algorithm. 

Input: a, b 
Output: d 
d := b\ s := a 
WHILE 5 > 0 DO 

rem := remainder after dividing d by s 
d := s 
s := rem 
END WHILE 
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Hints for Selected Exercises 


Hint 1.1 The sum is n 2 . 

Hint 1.2 The sum 1 + ^™ =1 j\j = (n + 1)!. 

Hint 1.3(ii) Either prove this by induction, or use part (i). 

Hint 1.4 This may be rephrased to say that there is an integer q„ with 10" = 
9 q n + 1- 

Hint 1.9(ii) One must pay attention to hypotheses. Consider a 3 + b~ if b is 
negative. 

Hint 1.10 There are n + 1 squares on the diagonal, and the triangular areas on 
either side have area 1 1 ■ 

Hint l.ll(i) Compute the area R of the rectangle in two ways. 

Hint 1.11(H) As indicated in Figure 1.3, a rectangle with height n + 1 and base 
Y^!i=\ i k can be subdivided so that the shaded staircase has area Yl'i= i i k+l , while 
the area above it is 

l k + (1* + 2*) + (1* + 2 k + 3 k ) + • • • + (1* + 2 k + • • • + n k ). 

Hint l.ll(iii) Write P) = \ Yl’i= t ,2 + \ Y11=i ; i n Alhazen’s for- 

mula, and then solve for ^” =1 i 2 in terms of the rest. 

Hint 1.12(i) In the inductive step, use n > 10 implies n > 4. 

Hint 1.12(H) In the inductive step, use n > 17 implies n > 7. 

Hint 1.13 You may assume that the sum of a geometric series ar "- where 
0 < r < 1, is o/(l — r ). 

Hint 1.14 The base step is the product rule for derivatives. 
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Hint 1.15 The inequality 1 + x > 0 allows one to use Proposition A.2. 

Hint 1.16 Model your solution on the proof of Proposition 1.11. Replace “even” 
by “multiple of 3” and “odd” by “not a multiple of 3.” 

Hint 1.17 What is the appropriate form of induction to use? 

Hint 1.18 Use Theorem 1.12 and geometric series. 

Hint 1.19 For the inductive step, try adding and subtracting the same term. 

Hint 1.20 If 2 < a < n + 1, then a is a divisor of a + (n + 1)!. Most proofs do 
not use induction! 

Hint 1.24 Check that the properties of addition and multiplication used in the 
proof for real numbers also hold for complex numbers. 

Hint 1.26 Consider f(x) = (1 + x) n when x = 1. 

Hint 1.27(i) Consider /(x) = (1 + x)" when x = — 1. 

Hint 1.28 Take the derivative of /(x) = (1 + x)". 

Hint 1.30(i) Use the triangle inequality and induction on n. 

Hint 1.30(ii) Use the following properties of the dot product: if u,v e C, then 
\u\~ = u • u and u ■ v = \u\ |v| cos 9, where 9 is the angle between u and v. 

Hint 1.31 Only odd powers of i are imaginary. 

Hint 1.33(ii) Compare with part (i). 

Hint 1.35 How many selections of 5 numbers are there? 

Hint 1.37 Even though there is a strong resemblance, there is no routine deriva- 
tion of the Leibniz formula from the binomial theorem (there is a derivation using 
a trick of hypergeometric series). 

Hint 1.38(i) \/z = z/zz. 

Hint 1.41(i) The polar coordinates of (8, 15) are (17, 62°), and sin 31° ~ .515 
and cos 31° ~ .857. 

Hint 1.41(H) sin 15.5° « .267 and cos 15.5° « .967. 

Hint 1.42 Use the portion of the full division algorithm that has already been 
proved. 

Hint 1.44 19 | f-j, but 7 is not the smallest k. 

Hint 1.46 Use Corollary 1.34. 

Hint 1.47 Write m in base 2. 
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Hint 1.49(i) Assume s/n = a/b, where a/b is in lowest terms, and adapt the 
proof of Proposition 1.40. 

Hint 1.49(ii) Assume that </2 can be written as a fraction in lowest terms. 

Hint 1.53 If ar + bm = 1 and sr' + tm = 1, consider (ar + bm)(sr' + tm). 
Hint 1.54 If 2s + 3t = 1, then 2 (s + 3) + 3(t — 2) = 1. 

Hint 1.55 Use Corollary 1.37. 

Hint 1.56 If b > a. then any common divisor of a and b is also a common divisor 
of a and b — a. 

Hint 1.57 Show that if k is a common divisor of ab and ac, then k \ a{b, c). 
Hint 1.59 Use the idea in antanairesis. 

Hint 1.64(ii) Use Corollary 1.50. 

Hint 1.65 The sets of prime divisors of a and b are disjoint. 

Hint 1.66 Assume otherwise, cross-multiply, and use Euclid’s lemma. 

Hint 1.70(i) If neither a nor b is 0, show that ab /(a. b) is a common multiple of 
a and b that divides every common multiple c of a and b. 

Hint 1.72 Cast out 9’s. 

Hint 1.73 10= -1 mod 11. 

Hint 1.74 100 = 2 • 49 + 2. 

Hint 1.78 Use the fact, proved in Example 1.58, that if a is a perfect square, then 
a 2 = 0, 1, or 4 mod 8. 

Hint 1.79 If the last digit of a 2 is 5, then a 2 = 5 mod 10; if the last two digits of 
a 2 are 35, then a 2 = 35 mod 100. 

Hint 1.81 Use Euclid’s lemma. 

Hint 1.83 By Exercise 1.55 on page 52, we have 21 | (x 2 — 1) if and only if 
3 | (x 2 — 1) and 7 | (x 2 — 1). 

Hint 1.86(i) Consider the parity of a and of b. 

Hint 1.87 Try —4 coconuts. 

Hint 1.88 Easter always falls on Sunday. (There is a Jewish variation of this 
problem, for Yom Kippur must fall on either Monday, Wednesday, Thursday, or 
Saturday; secular variants can involve Thanksgiving Day, which always falls on 
a Thursday, or Election Day, which always falls on a Tuesday.) 
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Hint 1.89 The year y = 1900 was not a leap year. 

Hint 1.90 On what day did March 1, 1896, fall? 

Hint 1.91(iii) Either use congruences or scan the 14 possible calendars: there 
are 7 possible common years and 7 possible leap years, for January 1 can fall on 
any of the 7 days of the week. 

Hint 2.3(iv) Show that each of A + (B + C) and (A + B) + C is described by 
Figure 2.7. 

Hint 2.4 One of the axioms constraining the € relation is that the statement 

d £ X £ d 


is always false. 

Hint 2.5(i) You may use the facts: (1) lines i \ and €2 having slopes m \ and m 2 , 
respectively, are perpendicular if and only if m 2 m 2 = —1; (2) the midpoint of 
the line segment having endpoints ( a , b) and (c, d) is ( j(a + c), j(b + d)). 

Hint 2.6(i) Use Proposition 2.2. 

Hint 2.7 Does g have an inverse? 

Hint 2.9 Either find an inverse or show that / is injective and surjective. 

Hint 2.10 It isn’t. 

Hint 2.11 If / is a bijection, there are m distinct elements /(x 1 ), . . ., / (x m ) 
in Y, and so m < n: using the bijection / -1 in place of / gives the reverse 
inequality n < m. 

Hint 2.12(i) If A c X and |A| = n = |X|, then A = X\ after all, how many 
elements are in X but not in A? 

Hint 2.14(i) Compute composites. 

Hint 2.20 Use the complete factorizations of a and of o' . 

Hint 2.21(i) There are r cycle notations for any r-cycle. 

Hint 2.22(i) If a = (i'o . . . i r - 1 ), show that a k {id) = ik f° r k < r. 

Hint 2.22(ii) Use Proposition 2.24. 

Hint 2.24 Use induction on j — i. 

Hint 2.26(i) If a = (oi 02 • • • ak)(b\ bi - ■ -bk) ■ ■ ■ {c\ C 2 ■ ■ ■ Q- ) is a product of 
disjoint k-cycles involving all the numbers between 1 and n, show that a = ft k , 
where ft = (a\ b\ • • • zi a 2 b 2 • • • Z 2 ■ ■ ■ ak bk • • • Zk)- 



Hints for Selected Exercises 591 


Hint 2.27(i) First show that f J >a k = a k ( J > by induction on k. 

Hint 2.29 Let r = (1 2), and define f : A n — > () n . where A„ is the set of all 
even permutations in S„ and O n is the set of all odd permutations, by 

f : a i-> r a. 


Show that / is a bijection, and conclude that \A n \ = \ O n \. 

Hint 2.33(i) There are 25 elements of order 2 in S 5 and 75 in S e- 

Hint 2.33(ii) You may express your answer as a sum not in closed form. 

Hint 2.34 Clearly, (y l ) f> = 1. Use Lemma 2.53 to show that no smaller power 
of y r is equal to 1 . 

Hint 2.37(i) Use induction on k > 1. 

Hint 2.39 Consider the function / : G — »• G defined by / (x) = x 2 . 

Hint 2.40 Pair each element with its inverse. 

Hint 2.41 No general formula is known for arbitrary n. 

Hint 2.47 Let G be the four-group V. 

Hint 2.49 If x e HDK, then x m = 1 = x' K K 

Hint 2.50 Can an infinite group have only finitely many cyclic subgroups? 

Hint 2.53 If G 7 ^ ST, find disjoint subsets of G having |S| and |7j elements, 
respectively. 

Hint 2.55(ii) Consider a H Ha~ l . 

Hint 2.56 If a e Sx, define <p(a) = f o a o f~\ In particular, show that if 
|Z| = 3, then ip takes a cycle involving symbols 1, 2, 3 into a cycle involving a, 
b, c, as in Example 2.86. 

Hint 2.67(i) Consider 


(p: A = 


COS O' 
sin o' 


— sin a 
COS O' 


(cos a, sin a). 


Hint 2.68 List the prime numbers po = 2, p\ — 3, p2 — 5, . . . , and define 

<p(e o + e\x + eix 1 H F e n x") = p e Q ° ■ ■ ■ p e ” . 

Hint 2.72 Show that squaring is an injective function G — > G, and use Exer- 
cise 2.12 on page 102. 

Hint 2.73 Take G = S 3 , H = ((1 2)), and g = (2 3). 



592 


Hints for Selected Exercises 


Hint 2.74 Show that if A is a matrix which is not a scalar matrix, then there is 
some nonsingular matrix that does not commute with A. (There is a proof of this 
for n x n matrices given in Proposition 4.85.) 

Hint 2.75(iii) Consider cases A 1 A> , A' BA > , BA l A > , and (BA')(BAi). 

Hint 2.76(i) Note that A 1 = -I = B 2 . 

Hint 2.77 Use Exercise 2.59 on page 166. 

Hint 2.79 Use Proposition 2.95(h). 

Hint 2.80(ii) See Example 2.48(iv). 

Hint 2.80(iii) See Example 2.48(iv). 

Hint 2.81 The vertices X = {no, . . . , v n -i } of it,, are permuted by every isome- 
try cp e S(7r„). 

Hint 2.86(iii) Define / : H x K — »• H by / : (h, k) h>- h. 

Hint 2.87 If G/Z(G) is cyclic, use a generator to construct an element outside 
of Z(G) which commutes with each element of G. 

Hint 2.88 |G| = \G/H\\H\. 

Hint 2.89 Use induction on n > 1, where X = {a\, . . . , a n }. The inductive step 
should consider the quotient group G/{a n +\). 

Hint 2.94 If H < G and |//| = |, what happens to elements of H in G/K7 

Hint 2.95(i) Use the fact that H C H K and K C HK. 

Hint 2.99 Use Wilson’s theorem. 

Hint 2.100(ii) Use Exercise 2.98. 

Hint 2.102 Use a conjugation. 

Hint 2.105 Use Cauchy’s theorem. 

Hint 2.108 Use Proposition 2.133. 

Hint 2.109(i) Recall that A 4 has no element of order 6. 

Hint 2.109(ii) Each element x e D 1 2 has a unique factorization of the form 
x = b' a, where b 6 = 1 and a 2 = 1. 

Hint 2.110(h) Use the second isomorphism theorem. 

Hint 2.111 You may use the fact that the only nonabelian groups of order 8 are 
D% and Q. 
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Hint 2.112(i) There are 8 permutations in 54 commuting with (1 2) (3 4), and 
only 4 of them are even. 

Hint 2.113(i) If a = (1 23 4 5), then |Cs 5 (a)| = 5because24 = 120/|Cs 5 (a)|; 
hence Cs 5 (a) = (or). What is Ca 5 (ch)? 

Hint 2.116(i) Show that (1 2 3) and (i j k ) are conjugate as in the proof of 
Lemma 2.153. 

Hint 2.117 Use Proposition 2.33, checking the various cycle structures one at a 
time. 

Hint 2.119 Use Proposition 2.95(h). 

Hint 2.121(i) Kernels are normal subgroups. 

Hint 2.121(ii) Use part (i). 

Hint 2.122 Show that G has a subgroup H of order p, and use the representation 
of G on the cosets of H. 

Hint 2.123 If H is a second such subgroup, then H is normal in S n and hence 
H n A„ is normal in A„. 

Hint 2.125 The parity of n is relevant. 

Hint 2.128(i) The group G = D\q is acting. Use Example 2.62 to assign to each 
symmetry a permutation of the vertices, and then show that 

Pq(x\ * 5 ) = + 4x 5 + 5 * 1 * 2 ) 


and 

P G (q, ...,<?) = jo(q 5 + 4q + 5q 3 ). 

Hint 2.128(ii) The group G = D \2 is acting. Use Example 2.62 to assign to 
each symmetry a permutation of the vertices, and then show that 

Pa (* 1 , . . . , xg) = p2 (xj + 2*6 + 2*3 + 3*2 + 4*9) 

and so 

P G (q, ...,q) = j^iq 6 + 2 q + 5 q 2 + 4 q 3 ). 

Hint 3.6(i) You may use some standard facts of set theory: 

un(vuw) = (un V) u((/n wy, 

if V' denotes the complement of V, then 


U -V = U nv'; 
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the de Morgan laws (Exercise 2.2 on page 100): 

(UnVY = U f UV' and (U U V)' = U' D V' . 

Hint 3.10(ii) If zw = 0 and z = a + ib ^ 0, then zz = a 2 + b 2 0, and 



Hint 3.12 Every subring R of Z contains 1. 

Hint 3.13 Use Theorem 1.65. 

Hint 3.14(i) Yes. 

Hint 3.14(ii) No. 

Hint 3.18 If R x denotes the set of nonzero elements of R, prove that multiplica- 
tion by r is an injection R x — »■ R x , where r e R x . 

Hint 3.19 Use Corollary 1.20. 

Hint 3.26(i) See Example 2.48(iv). 

Hint 3.27 Ifx” 1 exists, what is its degree? 

Hint 3.29(i) Compute degrees. 

Hint 3.30 Use Fermat’s theorem. 

Hint 3.31(i) Compare the binomial expansions of (1 + x) pm and of (1 + x m ) p 
in Fp[x]. 

Hint 3.33 This is not a hard exercise, but it is a long one. 

Hint 3.35(ii) Generalize the example in part (i). 

Hint 3.35(ii) The condition is that there should be a polynomial g(x) = a nX n 
with f(x) = g(x p ); that is, f(x) = J2 b n x np , where bn = a n for all n. 

Hint 3.36(i) The proof for polynomials, Proposition 3.25, works here. 

Hint 3.37(i) If R is a domain and a. r e /?[[a ] | are nonzero, prove that ord(crr) = 
ord(cr ) + ord(r), and hence ax has an order. 

Hint 3.40(ii) First prove that 1 + 1 =0, and then show that the nonzero elements 
form a cyclic group of order 3 under multiplication. 

Hint 3.46 Use the previous exercise to prove that <p is a homomorphism. 

Hint 3.47(i) Use Exercise 2.12 on page 102. 

Hint 3.50(i) Define 4>: Frac(A) —> Frac(7?) by [o, b] >—>■ [ <p(a ), cp(b)]. 
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Hint 3.53(i) Show that (r, s) is a unit in R x S if and only if r is a unit in R and 
s is a unit in S. 

Hint 3.53(ii) See Theorem 2.126. 

Hint 3.54(ii) Define <p : F — »■ C by (pi A) = a + ib. 

Hint 3.55 The answer is x — 2. 

Hint 3.56 Use Frac(tf). 

Hint 3.60 Use Frac(7?). 

Hint 3.61 See Exercise 1.53 on page 52. 

Hint 3.62 Mimic the proof of Proposition 1.40 which shows that \/2 is irrational. 
Hint 3.63 Use Exercise 3.34 on page 240. 

Hint 3.65(ii) The general proof can be generalized from a proof of the special 
case of polynomials. 

Hint 3.67 There are < q,r e R with b 1 = qb' +l + r. 

Hint 3.68(i) If 1 is a nonzero ideal, choose r e / of smallest order. Use Exer- 
cise 3.37 on page 240 to prove that I = (r). 

Hint 3.69(i) Example 3.39. 

Hint 3.70 See Exercise 1.70 on page 56. 

Hint 3.73(i) Use a degree argument. 

Hint 3.74 Show that y x + 1 is not a polynomial. 

Hint 3.75 Let k be a field and let R be the subring of k[x ] consisting of all 
polynomials having no linear term; that is, /(x) e R if and only if 

2 3 

fix ) = s o + S 2X + s^x H . 

Show that x 5 and x 6 have no gcd. 

Hint 3.77(i) See Exercise 3.63 on page 271 and Corollary 3.75. 

Hint 3.78(i) Use Theorem 3.50. 

Hint 3.78(ii) Set x = a/b if b ^ 0. 

Hint 3.80(vii) Show that f(x ) has no roots in Q and that a factorization of /(x) 
as a product of quadratics would force impossible restrictions on coefficients. 

Hint 3.80(viii) Show that fix ) has no rational roots and that a factorization of 
fix) as a product of quadratics would force impossible restrictions on coeffi- 
cients. 
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Hint 3.82 The irreducible quintics in F2 [x] are: 

X~ + X~ + X~ + X + 1 X' + X 4 + X" + X + 1 
X~+X 4 +X^+X+l X'+X 4 + X^+X“+l 

x^+x^ + 1 x ^ + x “ + 1 . 

Hint 3.83(i) Use the Eisenstein criterion. 

Hint 3.84 f(x) f*(x), which reverses coefficients, is not a well-defined 
function k[x] — »• k[x\. 

Hint 3.85 This exercise uses group theory. Use induction on the number of 
generators. 

Hint 3.88 Adapt the proof of Exercise 1.55 on page 52. 

Hint 3.89(ii) See Proposition 2.78. 

Hint 3.89(iii) mn = |Z„ m | = | mup\ < |Z,„ x Z„| = mn. 

Hint 3.90(i) Adapt the proof of Theorem 1.69. 

Hint 3.90(ii) See the proof of Theorem 2.126. 

Hint 3.91 See Exercise 3.77 on page 278. 

Hint 3.94 Use Exercise 2.53 on page 155. 

Hint 3.95 Show that F* = (—1) x H, where H is a group of odd order m, say, 
and observe that either 2 or —2 lies in H because 

F 2 x I m = ({1} x H)U ({—1} x H) . 

Finally, use Exercise 2.72 on page 166. 

Hint 3.96(ii) Equate like coefficients after expanding the right-hand side. 

Hint 3.96(iii) In the first case, set a =0 and use b to factor x 4 + 1. If a 7^ 0, 
then d = b and b 2 = 1 (so that b = ±1); now use a to factor x 4 + 1. 

Hint 3.96(iv) Use Exercise 3.95 on page 304. 

Hint 3.98 See Exercise 3.26 on page 232. 

Hint 3.102(ii) Exercise 2.105 on page 204. 

Hint 4.3 If u, leV, evaluate — [(— v) + (— u)] in two ways. 

Hint 4.7(i) When are two polynomials equal? 



Hints for Selected Exercises 597 


Hint 4.8 The slope of a vector v = (a, b) is m = b /a. 

Hint 4.9(ii) Rewrite the vectors u, v, and n using coordinates in 1ST . 

Hint 4.11(ii) If A is skew- symmetric, then all its diagonal entries are 0. 

Hint 4.12 Use Theorem 3.83. 

Hint 4.13 Prove that (e,- , e j) = 5,-y for all i, j, where Sjj is the Kronecker delta. 

Hint 4.14 Prove that there is some m such that 7, A, A , . . . , A" 1 is a linearly 
dependent list. 

Hint 4.16 Prove that if v\ + U, . . . , v r + U is a basis of V/U, then the list 
v\, . . . , v r is linearly independent. 

Hint 4.18(ii) Take a basis of U DU' and extend it to bases of U and of U’ . 

Hint 4.22(ii) Let A be the matrix whose rows are the given vectors, and see 
whether rank( A) = m. 


Hint 4.23 If A is the matrix whose rows are vi, id ■ i’ 3 , is ran k( A) = 3? 


Hint 4.24 If y e k "‘ , prove that Ay is a linear combination of the columns of A. 


Hint 4.26(ii) Let A be Gaussian equivalent to an echelon matrix U , so that there 
is a nonsingular matrix P with P A = U . Prove that ji lies in the row space 
Row(A) if and only if P ji e Row(U). 


Hint 4.27(ii) If E p ■ ■ ■ E\ A = 7, then A 1 = E { 1 • • • E p l . Conclude that the 
elementary row operations which change A into 7 also change 7 into A -1 . The 
answer is 


A” 1 - i 

A — 4 


1 -3 -1 

1 1 -1 
-13 5 


Hint 4.34(ii) Here is the statement. If / : U — > VU i s a linear transformation with 
ker f = U, then U is a subspace of V and there is an isomorphism <p\ V/U — »■ 
im /, namely, <p( v + U) = f(v). 


Hint 4.35 Use Theorem 4.61. 


Hint 4.42 See the elementary row operations on page 346. 

Hint 4.45(ii) 0 = 1 - oo n = (1 - m)( 1 + co + or + • • • + a) n ~ v ). 

Hint 4.54 If B, = Pi A/ Pr 1 for all i, then (T’i © • • • © P , )~ 1 = P ~ 1 © • • • © P ~ 1 . 
Hint 4.56 Assume first that c e k. 

Hint 4.59 Recall the power series \/x = 1 — x + x 2 — x 3 + • • • , where x is a 
nonzero real. 
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Hint 4.64 C is the disjoint union U t6 c 

Hint 4.69(i) Use the factorization x 15 — 1 = <U i ) O ^ Uc ) 'U 5 Uc ) 0 15 (^c ) in Z[x\. 
where <t> ,/ (x ) is the c/th cyclotomic polynomial. 

Hint 4.69(ii) Try g(x) = x 4 + x + 1. 

Hint 4.69(iii) Show that f 2 is a root of g(x). 

Hint 4.70 There is an error locator vector of the form (*, 0, 0, *). 

Hint 5.2(ii) As a practical matter, given a monic polynomial in Q[.v], one should 
first use Theorem 3.90 to see whether it has any rational (necessarily integral) 
roots. 

Hint 5.7 Apply complex conjugation to the equation f(u ) = 0. 

Hint 5.8(i) By definition, cosh 0 = j(e e + e~ e ). Expand and then simplify 
4[±(e e + e~ e )f - 3[\(e e + e ~ 9 )] and obtain \(e 39 + e~ 3e ). 

Hint 5.8(ii) By definition, sinh0 = e 9 — e~ e ). 

Hint 5.9 r = cos 3 6 = cos 3 (6> + 120°) = cos 3(6 + 240°). 

Hint 5.10 The roots are —4 and 2 ± V— 3. 

Hint 5.11 The roots are 17 and ^(— 1 ± \/— 3). 

Hint 5.12(i) The roots appear in unrecognizable form. 

Hint 5.12(ii) The roots are 4 and —2 ± >/3. 

Hint 5.13 The roots are 2 and — 1 ± y/3. 

Hint 5.14 This is a tedious calculation. The roots are —3, — 1, 2 ± \/6. 

Hint 5.19 No. 

Hint 5.20(ii) Use Proposition 1.36. 

Hint 5.20(iii) Use Exercise 2.12 on page 102 to prove the Frobenius F : k — > k 
is surjective when k is finite. 

Hint 5.21(ii) Use Proposition 3.116. 

Hint 5.21(iii) Prove that if a e G, then a is completely determined by a (a), 
which is a root of the irreducible polynomial of a. 

Hint 5.21(iv) Prove that F has order > n. 

Hint 5.24 Observe that x 30 — 1 = (x 6 — l) 5 in F5 [x]. 

Hint 5.28(i) If a is a real root of f(x), then Q(a) is not the splitting field of 
fix). 
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Hint 5.28(ii) Use part (i). 

Hint 5.28(iii) Try g(x) = 3x 3 — 3x + 1. 

Hint 5.29(ii) Use Exercise 3.63 on page 271. 

Hint 5.31(i) Consider f(x) = x p — t e F ; , (r)[.v |. 

Hint 6.5(ii) Use part (i). 

Hint 6.5(iii) Use part (i). 

Hint 6.7 There are 14 groups. 

Hint 6.9 If B is a direct sum of k copies of a cyclic group of order p" . then how 
many elements of order p n are in 5? 

Hint 6.10 If A and B are nonzero subgroups of Q, then A n B ^ {0}. 

Hint 6.11(i) Use the proof of the basis theorem (Theorem 6.1 1). 

Hint 6.12 Assume first that G is primary. 

Hint 6.13 If F is a direct sum of m infinite cyclic groups, prove that F /2F is an 
m -dimensional vector space over F? . 

Hint 6.16 Consider 63 x 53. 

Hint 6.18 If g e G, then gPg~ [ is a Sylow /i-subgroup of K, and so it is 
conjugate to P in K . 

Hint 6.19 It suffices to find a subgroup of .SV, of order 16. Consider the disjoint 
union {1, 2, 3, 4, 5, 6} = {1, 2, 3, 4} U {5, 6}, and use Exercise 2.95 on page 188. 

Hint 6.20 Use the fact that any other Sylow /i-suhgroup of G is conjugate to P. 

Hint 6.21 Compute the order of the subgroup generated by the Sylow subgroups. 

Hint 6.22(i) Show that [G/H : H P/H] and [H : H Cl P] are prime to p. 

Hint 6.22(ii) Choose a subgroup FI of S4 with H = S 3 , and find a Sylow 3- 
subgroup P of 54 with H Cl P = {1}. 

Hint 6.24 Some of these are not tricky. 

Hint 6.25 Adapt the proof of the primary decomposition. 

Hint 6.27 By Cauchy’s theorem, G must contain an element a of order p, and 
{a) <1 G because it has index 2. 

Hint 6.28(i) Every independent subset can be extended to a basis. 

Hint 6.28(ii) The group GL(r, k) acts on the set X of all linearly independent 
r-lists in (F ? )”, and use the proof of Theorem 6.30. 
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Hint 6.39(i) If p(z) = e' e z, define R(z ) = e ia z, where a = 2 jt — 0). 

Hint 6.42 Prove that every g £ G has a unique expression g = a'b J , where 
i £ {0, 1} and j £ TL. 

Hint 7.2 When is a Boolean ring a domain? 

Hint 7.4(ii) Let / : Z — > IUbe the natural map, and take Q = {0}. 

Hint 7.14(ii) If / and J are coprime, there are a £ I and b £ J with 1 = a + b. 
If r, r' £ R, prove that(d + /, d + J) = (r + I,r' + J) £ R/l x R/ J, where 
d = r'a + rb. 

Hint 7.15(ii) You may assume that every nonunit in a commutative ring lies in 
some maximal ideal (this result is proved using Zom’s lemma). 

Hint 7.29 Use the proof of the Hilbert basis theorem, but replace the degree of a 
polynomial by the order of a power series (where the order of a nonzero power 
series is its first nonzero coefficient). 

Hint 7.33 Use Exercise 1.3(i) on page 13. 

Hint 7.34 If f r £ 1 and g s £ /, prove that (/ + g)' +s £ 1. 
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see Ray-Chaudhuri, 416 
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circle group, 126 
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BCH, 417 
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cubic, 237 
cubic formula, 436 
cycle, 105 
cycle index, 210 
cycle structure, 1 12 
cyclic code, 413 

generating matrix, 415 
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four-group, 145 
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theorem, 299 
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